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the Probability of a Successful Guess 

Ibrahim Issa and Aaron B. Wagner 


Abstract —The secrecy of a communication system in which 
both the legitimate receiver and an eavesdropper are allowed 
some distortion is investigated. The secrecy metric considered is 
the exponent of the probability that the eavesdropper estimates 
the source sequence successfully within an acceptable distortion 
level. The problem is first studied when the transmitter and the 
legitimate receiver do not share any key and the transmitter 
is not subject to a rate constraint, which corresponds to a 
stylized model of a side channel and reveals connections to source 
coding with side information. The setting is then generalized to 
include a shared secret key between the transmitter and the 
legitimate receiver and a rate constraint on the transmitter, 
which corresponds to the Shannon cipher system. A single-letter 
characterization of the highest achievable exponent is provided, 
and asymptotically-optimal strategies for both the primary user 
and the eavesdropper are demonstrated. 


I. Introduction 

To compromise the security of a communication network, 
an eavesdropper need not have direct access to the decrypted 
content of the transmitted packets. In fact, simply monitoring 
and analyzing the network flow may help an eavesdropper 
deduce sensitive information. For example. Song et al. |l| 
show that the Secure Shell (SSH) is vulnerable to what is 
called timing attacks. In SSH, each keystroke is immediately 
sent to the remote machine, and an eavesdropper can thus 
observe the timing of the keystrokes. It is shown that this 
information can be used to significantly speed up the search 
for passwords, and it is estimated that each consecutive pair 
of keystrokes leaks around 1 bit of information. Zhang and 
Wang j2j] enhance the attack proposed in |[lj, and apply it 
in the setting of multi-user operating systems, in which a 
malicious user eavesdrops on other users’ keystrokes. Timing- 
based attacks appear also in various other settings, including: 
compromising the anonymity of users in networks ]3|4( , 
information leakage in the context of shared schedulers |5| 
and in the context of on-chip networks |6j. 

In this paper, we consider a stylized model of such infor¬ 
mation leakage problems, and call it the information blurring 
system. The setup, shown in Figure[l] consists of the following. 
A transmitter observes a sequence X n , which corresponds 
roughly to the original timing vector, and maps it to a sequence 
Y n that is observed by both the legitimate receiver and an 
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Fig. 1. Information blurring system: both the legitimate receiver and the 
eavesdropper are allowed a certain distortion level. 


eavesdropper. The mapping must almost surely satisfy a distor¬ 
tion constraint, which corresponds to some quality constraints 
imposed by the network (e.g., delay constraints). We do not 
require the mapping to be causal as the intent of this work is 
to provide fundamental limits for a simplified version of the 
information leakage problem. In broad terms, the transmitter 
wants to blur the information in X n (hence the name), so 
that it is no longer useful for the eavesdropper. For example, 
one approach is to artificially add noise to the input sequence. 
In that sense, the problem is related to methods for ensuring 
differential privacy, in which a curator wants to publicly 
release statistical information about a given population without 
compromising the privacy of its individuals ]7|8| . 

Upon observing the output Y n , the eavesdropper, who 
knows the source statistics and the transmitter’s encoding func¬ 
tion, tries to estimate X n . We introduce a distortion function 
and consider the eavesdropper’s estimate to be successful if 
the distortion it incurs is below a given level. Hence, we 
measure the secrecy guaranteed by a given scheme via the 
probability that the eavesdropper makes a successful guess. 
The primary user (i.e., the transmitter legitimate-receiver pair) 
aims then to minimize that probability. Since computing the 
exact probability is quite difficult, this paper will be mainly 
concerned with asymptotic analysis: we will derive the rate of 
decay (i.e., the exponent) of the probability of a successful 
guess. Other metrics for quantifying secrecy exist in the 
literature; we discuss the motivations and the shortcomings 
of the commonly used ones in Section |II| 

For a discrete memoryless source (DMS), we provide 
a single-letter characterization of the optimal exponent (cf. 
Theorem [T}. We show that the problem is related to source 
coding with side information. Essentially, the eavesdropper 
first attempts to guess the joint type of X n and Y n . S/he, then, 
“pretends” that Y n is received through a memoryless channel 
the probability law of which is the conditional probability 
P(Y\X) induced by the joint type. The problem can be 
viewed at this point as compression with side information, 
so the eavesdropper picks a codeword from an optimal rate- 
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distortion code. The primary user’s objective, therefore, is to 
supply the “worst” side information. Moreover, we demon¬ 
strate asymptotically-optimal universal schemes for both the 
primary user and the eavesdropper. The schemes are universal 
in the sense that they do not depend on the source statistics. 
In particular, the transmitter operates on a type-by-type basis, 
and associates with each type a rate-distortion code, the 
construction of which is based on the conditional probability 
law that provides the “worst” side information given that type. 

Next, we extend the study to the setup in which the 
transmitter is subject to a rate constraint, and the transmitter 
and the legitimate receiver have access to a common source of 
randomness, called the key. The eavesdropper has full knowl¬ 
edge of the encryption system, except for the realization of the 
key and the realization of X n . The setup is shown in Figure [2] 
and the special case in which the legitimate receiver and the 
eavesdropper must reconstruct the source exactly is known as 
the Shannon cipher system |9). Since the transmitter is subject 

K e {o, i} nr _ 



Fig. 2. The Shannon cipher system with lossy communication: the transmitter 
and the legitimate receiver have access to a common key K, which consists of 
nr purely random bits, where r is called the key rate. The transmitter encodes 
X n using K, and sends a message M through a noiseless public channel of 
rate R. Both the legitimate receiver and the eavesdropper are allowed a certain 
level of distortion. The legitimate receiver generates the reconstruction Y n 
based on M and K , whereas the eavesdropper has access to M only to 
produce an estimate V n . 


paper. 

We note that Theorem[2]subsumes Theorem[l]by setting the 
key rate to be zero, and the channel rate to be high enough. 
We nevertheless present them separately for two reasons. We 
believe the information blurring system to be of independent 
interest, as it corresponds to problems different from the 
Shannon cipher system (e.g., the SSH timing attack). As such. 
Theorem [j] can serve as a baseline for future refinements 
of this model (say, by requiring the encoding to be causal). 
Moreover, it significantly simplifies the exposition of the 
results, by first revealing the connection to source coding with 
side information and then introducing the key and the rate 
constraint. 

Finally, it should be noted that Weinberger and Merhav 
studied the Shannon cipher system with lossy communica¬ 
tion poim (i.e, the setup of the second part of this paper), 
and independently suggested the same secrecy metric we pro¬ 
posed. Furthermore, they allowed a variable key rate. In their 
initial work (TO), they derived the optimal exponent under the 
assumption that the distortion constraint of the eavesdropper is 
more lenient than that of the legitimate receiver (which makes 
the no-key problem degenerate). Our initial work (an earlier 
submission of the current paper) characterized the optimal 
exponent only under certain conditions (including the no-key 
case), which are not satisfied in the setting of 03. and pro¬ 
vided general upper and lower bounds. As such, those results 


were not comparable with that of 1101. Weinberger and Merhav 


later (ED generalized their result to characterize the exponent 
in general, as is done here. However, the suggested scheme 
herein and its subsequent analysis are significantly simpler. 
In particular, our scheme uses a traditional random coding 
construction followed by a separate key-based randomization. 


II. Secrecy Metric 


to a rate constraint, we allow the primary user to violate 
the distortion constraint, but restrict the probability of such 
event to be exponentially decaying. We again derive a single¬ 
letter characterization of the optimal exponent (cf. Theorem[2]i, 
and demonstrate asymptotically-optimal strategies for both the 
primary user and the eavesdropper. In particular, similarly to 
the previous setting, the transmitter operates on a type-by-type 
basis and associates with each type a rate-distortion code, the 
construction of which is based on the conditional probability 
law that provides the worst side information and satisfies the 
rate constraint (however, types with low enough probability 
are discarded, by associating a dummy messsage to all the 
source sequences belonging to such types). To make use of 
the shared key, we (randomly) generate many instances of 
such codes, and use the secret key to randomize the choice 
of the code selected for encoding X n . We also investigate 
conditions under which the resulting codes are optimal rate- 
distortion codes. As for the eavesdropper, we show that one 
of the following two schemes is optimal. The first consists of 
generating a blind guess, i.e., completely ignoring the public 
message. The second consists of guessing the value of the key 
to reproduce the reconstruction at the legitimate receiver, and 
then applying the strategy developed in the first part of the 


The information-theoretic study of secrecy systems was 
initiated by Shannon in |[9). Shannon derived the following 
negative result: ensuring perfect secrecy, i.e., making the 
source sequence X n and the public message M (cf. Figure [2]) 
statistically independent, requires that the key rate be at least 
as large as the message rate. 

As opposed to perfect secrecy, the notion of “partial” se¬ 
crecy is more difficult to quantify. However, the impracticality 
of ensuring perfect secrecy, as implied by Shannon’s result, 
means that developing such a notion is important from a 
practical point of view as well as a theoretical one. Shannon 
used equivocation — the conditional entropy of the source 
sequence given the public message H(X n \M) — as a “the¬ 
oretical secrecy index”. A main motivation for equivocation 
was the similarity between the deciphering problem for the 
eavesdropper in the secrecy setting and the decoding problem 
for the receiver in the standard noisy communication set¬ 
ting (9). Equivocation has subsequently been used as a secrecy 
metric in several works JT2)-[T8|. However, its use is not well 
motivated operationally. It only provides a lower bound on the 
exponent of the list size that the eavesdropper must generate 
to reliably include the source sequence. Moreover, Massey 
showed in G3 that the expected number of guesses that need 
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to be made to correctly guess a discrete random variable X 
may be arbitrarily large for arbitrarily small II (X). 

Merhav and Arikan | 20) proposed a more direct approach: 
they consider an i.i.d. source and they measure secrecy by 
the expected number of guesses that the eavesdropper needs 
to make before finding the correct source sequence, which 
they denote by E[G(A'"|M)], where G(.|m) is a “guessing” 
function defined for each possible public message m. This is 
intended to capture the scenario in which the eavesdropper has 
a testing mechanism to check whether or not his/her guess is 
correct. Such mechanism exists, for example, if the source 
message is a password to a computer account. When the 
source is discrete and memoryless, and the transmitter and the 
legitimate receiver have access to nr purely random common 
bits (where r is called the key rate), the optimal exponent of 


E[G(A n |M)] is found to be 120 Theorem 1]: 


E{P,r)= lim - logE[G(A n |M)l 

n—>oo Tl 

= max{min{if(Q), r} - D(Q\\P)} , 
Q 


(i) 


where P is the source distribution and D(-||-) is the Kullback- 
Leibler (KL) divergence. Two issues arise with this metric. 
First, even if a testing mechanism exists, any practical system 
would only allow a small number of incorrect inputs. Thus, 
it is not clear how to interpret an exponentially large number 
of guesses. Second, and more importantly, it turns out that 
even highly-insecure systems can appear to be secure under 
this metric. Indeed, by modifying the asymptotically-optimal 
scheme proposed in [ 20] , we can construct a scheme for the 
primary user that allows the eavesdropper to find the source 
sequence correctly with high probability by the first guess, and 
yet achieves the optimal exponent in ([TJ. The scheme proposed 
in 1201 operates on the source sequences on a type-by-type 
basis, and it yields: 

E [G(X n \M)\X n G Tq\ > 2"min{r,J?(Q)}-o(n)^ (2 ) 

where o(n)/n -> 0 as n -> oo, and Tq is the type class of a 
given type Q, i.e., the set of sequences with empirical distri¬ 
bution Q. Averaging over the probabilities of {Tq} yields the 
exponent in ([T} (as a lower bound). However, this means that it 
is enough to apply the proposed scheme to the type class Tq 
that achieves the maximum of [mm{H(Q),r} ~ D(Q\\P)\, 
whereas sequences belonging to other type classes can be sent 
with no encoding whatsoever with no effect on the exponent. 
Therefore, only a set with vanishing probability is encoded, 
whereas sequences outside that set are immediately known by 
the eavesdropped 

A different approach, based on rate-distortion theory, was 


adopted by Yamamoto in [211. A distortion function is intro¬ 


receiver. An earlier work by Yamamoto [[22] considered the 
special case where no key is available, under the same secrecy 
metric. A standard example, discussed and generalized in | [23) , 
shows why expected distortion is inadequate: Suppose A” is 
a sequence of independent and identically distributed bits with 
A i ~ Ber(l/2), the transmitter and the legitimate receiver have 
access to one common bit K ~ Ber(l/2), and the distortion 
function is the Hamming distance. The transmitter then sends 
the sequence X n as is if K = 0, and flips all its bits if K = 1. 
The induced expected distortion at the eavesdropper is then 
equal to 1/2, which is also the maximum expected distortion 
that the eavesdropper can possibly incur, since it is achievable 
even if the public message is not observed. However, this 
“optimal” scheme in fact reveals a lot about the true source 
sequence; namely, it is one of only two possible candidates. 

To overcome this limitation of expected distortion, Schieler 
and Cuff [ |24]25l allow the eavesdropper to generate an 
exponentially-sized list of estimates and propose the expected 
minimum distortion over the list as a secrecy metric. It is 
not clear, however, how to operationally interpret a list of 
exponential size. It is shown that this setting is equivalent to 
the following: there exists a “henchman” that has access to the 
source sequence X n and public message M, and can transmit 
uRl bits to the eavesdropper who measures secrecy by the 
minimum expected distortion. However, this metric leads to 
a degenerate trade-off between the key rate r, the allowed 
list exponent (henchman rate) Rl, and the expected minimum 
distortion in the list D e . For example, if the legitimate receiver 
must reconstruct X n losslessly, one of two cases occurs (see 


125 Theorem 1]): If r > RI), is given by the rate-distortion 


duced and the secrecy of a given scheme is measured by the 
minimum attainable expected distortion at the eavesdropper. 
Also, a certain level of distortion, possibly corresponding to 
a different distortion function, is allowed at the legitimate 

1 Merhav and Arikan actually characterize, for any p > 0, the exponent of 
E[G p (X n \M)]. This more general result can still yield large exponents for 
systems that are highly insecure, although one could potentially address this 
issue by requiring schemes that yield large exponents simultaneously over a 
range of p values. 


function at R^. If r < /?/,, D e = 0 since the eavesdropper can 
trivially find the exact sequence by listing all the possible keys. 
This fails to capture that, even when Rl < r, the eavesdropper 
can still list 2 uRl possible keys and thus recover exactly the 
correct sequence with probability at least 2 n ^ RL ~ r \ As R^ 
approaches r, this probability can be made to decay arbitrarily 
slowly. It is worth noting that Schieler and Cuff also consider 
a causal disclosure setting [23j , in which the eavesdropper 
observes, at time i, the public message M and A* -1 . Although 
this is more robust than expected distortion, it captures only 
a limited range of practical scenarios. 

In this paper, we take a different approach. In many applica¬ 
tions, the eavesdropper has no way to verify if his/her estimate 
is correct. This is particularly true in our main case of interest, 
i.e., timing of events. Moreover, as mentioned before, most 
practical systems allow a small number of incorrect guesses 
even if a testing mechanism exists. Therefore, we allow the 
eavesdropper to make one guess only. Secrecy is measured 
then by the probability that the guess is successful, i.e., the 
distortion incurred is below a given level. For the purposes of 
the asymptotic analysis in this paper, we will study only the 
exponent of the probability of a successful guess. A special 
case of such analysis was considered by Merhav in )26) . 
In particular, |26) is concerned with necessary and sufficient 
conditions for achieving the perfect secrecy exponent, which 
is the exponent attained by the eavesdropper in the absence of 
any observation. It is also restricted to the case in which both 
the legitimate receiver and the eavesdropper must reconstruct 








4 


the source sequence exactly. Finally, a relevant earlier work 
by Arikan and Merhav |27) considers the problem of blindly 
guessing a random variable up to a distortion level and 
characterizes the least achievable exponential growth of the 
expected number of guesses. 

III. Information Blurring System 

We consider the following secrecy system. Let X, y, and V 
be the alphabets associated with the transmitter, the legitimate 
receiver, and the eavesdropper, respectively. The transmitter 
wants to provide the legitimate receiver with a quantized 
version of an n-length message X n = (Xi,X 2 , ■ ■ ■ ,X n ). It 
thus generates a vector Y n through a (possibly randomized) 
function / : Y n = f(X n ). For a given distortion function 
d : X x y — >• M + , the quantization is required to satisfy a 
constraint of the form d(X n ,Y n ) = 1 Y^ii=id(Xi,Yi) < D 
for a given distortion level D. The restriction is imposed 
on each realization of ( X n ,Y n ). An eavesdropper, with an 
associated distortion function d e : X x V —t R+, also observes 
Y n and generates a guess V n = g(Y n ), aiming to have 
d e {X n , V n ) = i Ya =i d e {Xi, Vi) < D e for a given distortion 
level D e . 

It is assumed that the eavesdropper knows the source statis¬ 
tics and the primary user’s encoding function /. The secrecy 
metric we adopt is the probability that the eavesdropper makes 
a successful guess, i.e., Pr (d e (X n ,V n ) < D e ). The primary 
user’s objective is to minimize this probability. So, the problem 
can be written as: 

min max Pr (d e (x n , g n (f n (X n ))^j < D^j . 

We characterize the highest achievable exponent of the prob¬ 
ability of a successful guess under the following assumptions: 

(Al) The alphabets X, y and V are finite. 

(A2) The source is memoryless, and without loss of general¬ 
ity, its distribution has full support. 

(A3) The distortion functions d and d e are bounded, i.e., 
there exists D max and D emax such that, for all x £ 
X, y £ y, and v £ V, d(x,y) < D max and 
d e (x,v) < -D e ,max- Moreover, D > -D m j n , where 
A n in = max^e^min yey d(x,y). Similarly, D e > 
-D e ,mini wheie D emin = max^^^ n i in r y d e {x,v). 

We denote the optimal exponent by E(P, D, D e ), where P 
is the source distribution, i.e., 

E(P,D,D e ) = 

lim max min—-log Pr (dJx n ) g n (f n (X n )) \ < D e ) , 

(3) 

where {/,,} is restricted to the class of functions ensuring the 
feasibility of the primary user’s problem. The existence of the 
limit will be seen later. 

We will show that the problem is related to source coding 
with side information, where Y n acts as side information 
for the eavesdropper. Therefore, the primary user’s job is to 
provide the “worst” side information subject to a distortion 


constraint of his/her own. To this end, we denote the condi¬ 
tional rate-distortion function as: 

R(P XY ,D e )= min I(X;V\Y), (4) 

PV\X,Y : 

E [d e (X,V)]<D e 

and define the quantity R(Px,D,D e ) as: 

R(P x ,D,D e )= max R(P X Y,D e ). (5) 

Py\x- 

E[d(X,Y)]<D 

Roughly speaking, when the joint type of X n and Y n 
is Pxy, the eavesdropper can restrict the guessing space 
to 2" R{ l>x Y - D, ' > reconstmction sequences, knowing that at 
least one of them must satisfy the distortion constraint. The 
maximization in © corresponds to the primary user’s goal 
of maximizing that quantity. 

We prove the following properties of R(Pxy, D e ) and 
R(Px, D, D e ) in Appendix |A| 

Proposition 1: In the following statements, the domains of 
D and D e are [D m j n , +oc) and [D e , m i n , +oo), respectively. 
(PI) For fixed Pxy, R(Pxy, D e ) is a finite-valued, 
non-increasing convex function of Furthermore, 
R(P XY ,D e ) is a uniformly continuous function of the 
pair (P X Y,D e ). 

(P2) For fixed Px, R(Px , D, D e ) is a finite-valued function 
of (D,D e ). Moreover, for fixed D e , R(P\,D,D e ) is a 
uniformly continuous function of the pair ( Px,D ). 

(P3) R e (Px,D e ) - R{P X ,D) < R(P x ,D,D e ) < 
R e (P x ,D e ), where R(P X ,D) and R e (P Xl D e ) are the 
rate-distortion functions corresponding to the distortion 
constraints d and d e , respectively. 

Our main result is the characterization of the optimal exponent 
as follows: 

Theorem 1: Under assumptions (A1)-(A3), for any DMS 
P, and distortion functions d and d e with associated distortion 
levels D > D m ; n and D e > D e m ; n , corresponding respec¬ 
tively to the primary user and the eavesdropper: 

E(P,D,D e )=mmD(Q\\P) + R(Q,D,D e ), (6) 

Q 

where Q ranges over all probability distributions on the source 
alphabet, and R{Q , D , D e ) is as defined in (J5}. 

Remark 1: We do not require any e-backoff for D or D e 
to characterize the associated exponent. 

An interesting feature of Theorem [T] is the emergence of 
mutual information as part of the solution in ([6]), even though 
the setup does not include any rate constraints. Moreover, an 
interesting contrast can be seen between the expression in (|T| 
for the expected number of guesses metric and the expression 
in ([6} for our metric. Indeed, the former evaluates the perfor¬ 
mance of a given scheme asymptotically by a weighted best- 
case scenario, whereas the latter evaluates it by a weighted 
worst-case scenario. 

As an application of the theorem, we compute the perfect 
secrecy exponent, which we define as the best achievable 
exponent when the primary user is not subject to any constraint 
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and denote it by Eq{P, D,). To this end, we introduce a trivial 
distortion function: d(x, y ) = 0, for all x £ X and y £ y. 
Then, R(Q. D) = 0, for all Q and all D > 0. It then follows 
from (P3) of Proposition [T] that R(Q, D, D e ) = R e (Q,D e ) 
for all Q. Therefore, 

E 0 (P,D e ) = min D(Q\\P) + R e (Q,D e ). (7) 
w 

The next two subsections are devoted to proving Theorem[I] 
We first propose a scheme for the primary user and show that 
the induced exponent is lower-bounded by the right-hand side 
of <(6j. From the eavesdropper’s point of view, this is a converse 
result. Similarly, we propose a scheme for the eavesdropper 
and show that the induced exponent is upper-bounded by the 
right-hand side of (|6j, which establishes the desired result. 

We set some notation for the remainder of the paper. In the 
following, Z is an arbitrary discrete set, and Z is a random 
variable over Z. 

- The set of probability distributions over Z is denoted by 
V z . 

- For a sequence z n £ Z n , Q z n is the empirical PMF of 
z n , also referred to as its type. 

- Qz is the set of types in Z n , i.e., the set of rational 
PMF’s with denominator n. 

- For Q z £ Q z , the type class of Q z is Tq z = {z n £ 
Z n : Q z n = Q z }. 

- E q [-], Hq(-), and denote respectively expecta¬ 

tion, entropy, and mutual information taken with respect 
to distribution Q. 

- All logarithms and exponentials are taken to the base 2. 

A. Achievability for the Primary User (Eavesdropper’s Con¬ 
verse Result) 

Let 

E~(P, D, D e ) = 

lim inf max min — — log Pr (d e (x n , g n (fn{X n ))'\ < D e ) . 
{M { 9n } n V V ) ) 

( 8 ) 

We will show that E~(P 1 D 1 D e ) > minQ D(Q\ |P) + 
R(Q,D,D e ). 

The primary user will operate on the source sequences on a 
type-by-type basis. For each type Qx e Q£. we create a rate 
distortion code Cq x to cover each sequence in Tq x as follows. 
We associate with Qx a joint type Qxy from Qffy (Qx,D)Q 

QxyiQx , D) = 

{Pxy £ Qxy ■ Px = Qx , E Pxy [d(X, Y)} < D}. (9) 

The code is then constructed from Tq y as given by the 
following lemma, which bounds the size of the code. 


2 n ( I Q X Y( X ’ Y )+ e \ and for all x n £ Tq x , there exists i 
satisfying (x n ,y™) £ T Qxy . 

The proof is a refinement of the covering lemma |28] Lemma 
2.4.1], We later prove a stronger result. Lemma [9] in Ap¬ 
pendix [E] 

Remark 2: One might be tempted to use an optimal rate- 
distortion code for each type Qx, presuming that this choice is 
best at preserving secrecy since it achieves optimal compres¬ 
sion, i.e., it only sends the necessary information. However, the 
problem is more subtle since the “redundancy of information” 
depends on the eavesdropper’s distortion constraint d e . The 
optimal choice of Qxy will be revealed when analyzing the 
eavesdropper’s optimal strategy. 

Now, fix e > 0 and let n be at least as large as no 
in Lemma [2j We will denote by Cq x the rate distortion 
code associated with type Qx- Thus, the function / of the 
primary user is as follows: each sequence x n is mapped to a 
sequence y n £ Cq satisfying Q x n y n = Qxy (where Qxy 
is associated with Q x n) and subsequently d{x n ,y n ) < D. 

To determine the eavesdropper’s optimal guess, we define 
B Dc (v n ) = {x n £ X n : d e (x n ,v n ) < D e }. Then, for each 
observed y n , the optimal rule is given by 

g(y n ) = argmax ^ p(x n \y n ). 

v a;"£B d c (»") 

This can be understood as the MAP rule, and we denote in the 
remainder bfl g Q (where “o” stands for optimal). To upper- 
bound the probability of a correct guess, we consider a genie- 
aided rule that is aware of the type of the transmitted source 
sequence. That is, the genie-aided MAP rule yields 

9o(y n ,Qx) = ar g m ^x ^2 p(x n \y n ,X n G T Qx ). 
v x n €iBD e (i' Tl )nQx 

Remark 3: One should not expect the upper bound to be 
loose since there are only polynomially many types in n, so 
that the exponent is not affected. 

For a given y n , let f Q ] (y n ) = {x n £ T Qx : f(x n ) = y n } 

be the set of sequences in Tq x that are mapped to it. Then, 
the observation of y n implies that X n £ fQ x (y n ), and the 
genie-aided MAP rules makes a successful guess if X n £ 
Bd c ( go(y n , Qx))- Therefore, we will derive an upper bound 
on the maximum possible size of the intersection of these two 
sets. First, note that, x n £ Tq x and f(x n ) = y n implies that 
Q x n y n = Qxy, where Qxy is the joint type associated with 
Qx- So /q>”) c T QxlY (y n ) 4 {a- e T Qx : (x n ,y n ) £ 
Tq xy }■ Now, consider any v n £ V", 


Lemma 2: Given e > 0, there exists no (e, | X\, |^|) 
such that for any n > no, for each joint type Qxy £ 
Qxy, there exists a code (y\ l . y^, - ■ ■ ,y^) such that N < 


B De {v n )^fQ X {v n ) 


< 

B Dc (v n )f]T QxlY 

(: y n ) 


2 Assumption (A3) guarantees that Q-xy(Qx,D) is nonempty for any 
Qx- 


3 The MAP rule depends on / and thus should be denoted by g 0 ,f- Since 
this is obvious, we drop the subscript / for notational convenience. 






6 


(a) 


(b) 


E^ 

PxYvtQxyv'- 

Pxy—Qxy 

YV [d e (x; 
Pyv=Q v « 


E i 

x n : 

(x n ,y n ,v n )eTp XYV 


E 


\T Px ^ Y (v n ,y n )\ 

PxYveQ n (QxY,D e ): 

Py V —Q y n v n 

< (n + i)mmivi lTp x|Viy («",!,")| 

(vxy,^ej: 

Py v—Q y n v n 

< (n+ 1)\*\W l y l max 

PxYveQ n (QxY,D e ): 

PYV=Qy n v n 

( 10 ) 

where 

(a) follows from the fact that (x n ,y n ,v n ) £ Tp XYV => 
-Pry = Qy™ v n - 

(b) follows from the definition of Q n (Qxy, D e ) as: 

Q n (QxY,D e ) = {Pxyv S : Pxy = Qxy, 

E Pxyv [4(X,^)] (11) 

(b) follows from Lemma 1.2.5 in 
Therefore, for large enough n, we get 

< max 2 n(H PxYV (X\V,Y)+e) (12) 

PxYV&Q n (QxY,D B ) 

Let P*(Qxy ) be the joint type achieving the max in ( | 1 2j i, 
where the dependence on D e is suppressed since it is fixed 
throughout the analysis. We can now upper-bound the prob¬ 
ability that the eavesdropper makes a successful guess as 
follows: 


< D r 


Pr (de (*",«,„(/(*"))) 

<Pr (d e (x n , 9o (. f(X n ),Q X n )) 


< D f 


= Y P(x n n{x n e%(s»(/M,^))} 


x"GA'" 


(b) follows from Lemma [2] 

To interpret the exponent in note that P*(Qxy) 

minimizes I(X;V\Y) over Q h (Qxy, De) (follows read¬ 
ily from (0). Therefore, I p +(q xy )(X;V\Y) is roughly 
R{Qxy,D b ). The eavesdropper’s scheme can then be seen 
as picking a codeword from an optimal rate-distortion code 
that uses side information generated according to Qy\x- 
Since Qxy is the choice of the primary user, who is 
interested in maximizing the exponents in ( [T3] >, we define for 
each Q x £ Q x '- 

Q*(Qx)£ argmax I p * (Qxy) (X-V\Y), (14) 

QxY^QxyiQx ,D) 

where we have again suppressed the dependence on D and 
D e in the notation. 

Remark 4: The maximization does not depend on the 
source statistics, and consequently neither does the proposed 
encoding function /. 

With a slight abuse of notation, we rewrite P*(Q*(Q X )) as 
Pn(Qx) to get 

I P * {Qx) (X;V\Y) = max I p ^ Qxy] {X-V\Y) 

Qxy€ 

Qxy (Qx ,D) 

= max min I Qxyi (X;V\Y). 

vxyt Qxyv€ 

Q-xyiQx ,D) Q n (QxY,D B ) 

(15) 

We can now rewrite ( fl3j ) as 

Pr (d e (x n , 9o (f(X n )j) <D e ) 

< Y 2 _ "( D (° x||P)+7p 4Wx)( X ^|^)-2e) 

Qx&Qx 

<(n+ 1)1*1 max 2 ~ n ^ Q ^ p P I n^x^ x X\ Y )-^)_ 
Qx&Q’i 

(16) 

Taking the limit as n goes to infinity, and noting that e is 
arbitrary, we get 

E-(PD,D e ) = 


— Y E E i x £PD B (go(y jQx))} liminfmaxmin logPr (d e ( X n , g n (f n (X n )) 

i n PTo. ■ n->c» {/„} {g n } n V V 


(a) 

< 


Qx£Q.xV rl eC% x x ,l £TQ x -. 

' fU n )=y n 

'Y^ 'Y\ 2 u ^~ d ^ x ^ p p H qx ^ x ^■ 

QxtQx v n ^ C Q x 

2 n(H P * (QxY) (X\V,Y)+e) 


< D e 


>mmD(Q\\P) + R(Q,D,D e ), 
Q 


(17) 


where the last inequality follows from the following proposi¬ 
tion, the proof of which is given in Appendix |B| 

Proposition 3: 


(b) 


< Y 2 n ( lQx Y( X X)+Z- D ( Q x\\P)- H Q x ( X ))' 

Qx&Qx 

2n(Hp*(Q XY i(X\V,Y)+e^ 

Y 2’ 1 (- fl W'H P )- fl «AT(' Y l 5 ')+^«3.Y r )( 1 l' , 4)+2e) 


Oxes; 


Y 2~ n ( D(QxllP)+Ip ^^ xx > {X ’ VlY) ~ 2e \ (13) 

Qx&Qx 


where 

(a) follows from (fl2]i. 


Hm r. mi o- 1 [ jD ^ X ll P ) + Ip Z.(Qx)( X ’’ V \ Y )\ 
n->ooQ x eQ£ 

= mmD{Q\\P) + R(Q,D,D e ). 

Q 

B. Converse for the Primary User (Eavesdropper’s Achiev- 
ability Result) 

Let E + (P, D, D e ) = 

lim sup max min--logPr (d e (x n ,g n (f n (X n )j\ <D e ). 

n—foo {fn} {gn} 77. V V / / 

(18) 
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TABLE I 

Summary of useful notation. 


Notation 

Description 

R(P XY ,D e ) 

minp VIx,y:E[d e (A',V)]<D«, V\Y) 

R(P x ,D,D e ) 

max P Y | X :E[d(X,y)]<D R(PxY,De) 

BdAv 71 ) 

x n £ X n : d e (x n ,v n ) < D e 

Q n xv {Qx,D) 

Pxy e Qf v :P X = Qx, E Pw 1 d{X, Y)\ < D. 

Q^, v (Qy,D) 

Pxy £ Q n xv :Py = Qy, E p XY [d(X,Y)] < D. 

Q n (QxY,D e ) 

Pxyv £ Qxyv -Pxy = Qxy, 

E PxYV [d e (X, V)]<D e . 


We will now show that E + {P 1 D 1 D e ) < hyitlq D{Q\\P) + 
R(Q,D,D e ). This means that the eavesdropper can achieve 
the exponent in for any function / the primary user 
implements. 

We propose a two-stage scheme for the eavesdropper. In the 
first stage, observing y n , s/he tries to guess the joint type of 
x n and y" by choosing an element uniformly at random from 
the set QJxy(Qy-n-iD), where 


= \T Qvlx ,A x ^y n )\ 

\ T Q V \ Y (y n )\ 

„ nnH(V\X,Y) 

> _ = r o ~nI{X-,V\Y) 

- 2 nH ( v \ Y ) n 

where the second inequality follows from Lemma 1.2.5 in 


Since the eavesdropper is interested in maximizing this 
probability, s/he will associate, with each Qxy, a joint type 
achieving the maximum: 

Pu(.Qxy)€ argmin I(X;V\Y). (20) 

PxY V eQ n (QxY,De) 

Note that this is the same joint type achieving the maximum 
in ( fl2l ). 

We can now lower-bound the probability that x n £ 

Bjj c (g(y n )), for a given pair ( x n ,y n ) satisfying d(x n ,y n ) < 
D. ° 


Qxy(QY,D) = {P X y £ Qxy -Py = Qy , 

E Pxr [d(X,Y)\<D}. 

(19) 

The correct joint type must fall in this set since the restriction 
d(X n ,Y n ) < D is imposed on each realization of (X n ,Y n ). 
We denote the function corresponding to this stage by gi : 

y n ••> Li'vv 

Remark 5: We differentiate between Q'fyiQY- D) and 
QxyiQx , D) by their hrst argument. A summary of different 
notations is given in Table |T| 

The eavesdropper then proceeds assuming gi(y n ) is the 
correct joint type. S/he randomly chooses a sequence from a 
set that covers Tq x]y (y n ). To this end, we associate with each 
joint type Qxy a joint type Qxyv from Q n {Q X Y,D e ) (cf. 
Table [I), and generate a sequence uniformly at random from 
Tq v] y (y n )i where Qv\y is the conditional probability induced 
by Qxyv- We denote the function corresponding to this stage 
by j 2 :fx Q n XY V n . Thus, g(y n ) = g 2 (y n , 9l (y n )). 

Remark 6: The above strategy does not depend on the 
specifics of the function / implemented by the primary user, 
i.e., it only uses the fact that d(X n , f(X n )) < D. It is also 
independent of the source statistics. 

The following lemma lower-bounds the probability 
that g 2 {y n ,QxY) generates a sequence V n satisfying 
d e (x n ,V n ) < D e , for a given pair ( x n ,y n ) £ T Qxy , i.e., 
assuming the eavesdropper guesses the joint type correctly. 

Lemma 4: Given joint type Qxyv £ Qxyv anc * 
(x n ,y n ) £ Tq xy , if V n is chosen uniformly at ran¬ 
dom from Tq (y n ), then Pr (V n £ Tq (x n )) > 
c n 2 ~ nlQ xYv( x ’ v h ) ( where c n = (n + ljH^IITlfvi. 

Proof: 

Pr (V n £ Tq vix (x n )) > Pr (V n £ T Qv]X Y (x n ,y n )) 


Lemma 5: Given (x n ,y n ) £ X n x y n satisfying 
d{x n ,y n ) < D, Pr (x n £ B De (g(y n ))) > 
d n 2- n iw QX Y*x i v\Y)' where ^ = (n + i)-|*||y|(|v|+i) > 

Qxy = Qx"y r, > an d g is as described above. 

Proof: 

Pr ix n £B D Mv n ))) 

= E P(9i(y n ) = Q'xy)' 

Q'xY^Q-XY^Qy 71 iD) 

p (x n £ Bo e (g-i (y n ; QW))) 

>p{gi{y n ) = Qx™y™)p (x n £ B De {g 2 {y n ,Qx-y-)y) 

> (n + iy lxim p (x n £ B DB (g 2 (y n ,QxY ))) 

> (n + l)-l ;t IITI(lv|+i) 2 -^p*(Q xy )(^W|y) ) 

where the last inequality follows from Lemma [4] ■ 

We now show that the above described scheme indeed 
achieves the exponent in ([6|. Consider any possibly random 
function / implemented by the primary user (and satisfying 
the distortion constraint), and denote by Pf the induced joint 
probability on ( X n ,Y n ). Now, consider the following chain 
of inequalities. 


Pr(d e (x n ,g{f(X n )j) <D e ) 

= E E P(x n )Pf(y n \x n )p(x n £ B Dc (g(y n ))) 

i“et" j/ n ey" 


(a) 

> c ; 

E 

P(x n )P f (y n \ x n )2~ nIp ^ Q ^y^ X ' VW) 


x n ex n 


>< 

E P(x n ) E Wl* n )- 


x n ex n 




min 2 - nI ^CQxr)(^Win 



QxYEQxyiQ * 71 , D ) 

/ 

= Cn 

E 

E p oo- 


QxeC>"6T Qx 












min 2 -nI niQm) {X;V\Y) 

QxY^QxyiQx ,D) 


= c' n E E 2 -n{D{Q x \\P)+H Qx {X )). 

QxGCJ x n GTQ x 

2 -nI P + (Qx) (X-V\Y) 

> c’ n (n + l)~\ x \ V 2~ n ( D{Qx||p)+/p ^«*> (A ' ;V| ’ n ) 
Qx-eQj 

> C ;(n + l)-l A l max 
“ QxeQj 

( 21 ) 


where 

(a) follows from Lemma [5] 

(b) follows from ([20} and CD- 

(c) follows from Lemma 1.2.3 in 

(\T Qx \ > (n + l)~\ x \2 nH Qx( x )) . 

Taking the limit as n goes to infinity, we get 


E+(P,D,D e ) 

= lim sup max min — — log Pr (d e ( X n , g n (f n (X n ))\ 
n .—ino n V V / 


n —>00 {fn} {g-n} Tl 

<wmD{Q\\P) + R{Q,D,D e ), 
Q 


( 22 ) 


where the last inequality follows from Proposition [3 

Combining ( |22| ) and ( fl7| ) yields that the limit in (3| exists 
and is equal to the expression given in (|6j, thus establishing 
Theorem Q] 


IV. Lossy Communication for the Shannon Cipher 
System 

We now consider the setup of the Shannon cipher system 
with lossy communication. More precisely, the transmitter is 
subject to a rate constraint, and the transmitter and legitimate 
receiver share common randomness K e /C = {0, l} nr , 
where r > 0 denotes the rate of the key. K is uniformly 
distributed over 1C and is independent of X n . The transmitter 
sends a message M = f(X n ,K) to the receiver over a 
noiseless channel at rate R, i.e., M € M = {0,1}” P . 
The receiver, then, generates Y" = h(M, K). Both func¬ 
tions / and h are allowed to be stochastic, but must satisfy 
Pr ( d(X n , Y n ) > D) < 2~ na , for a given reliability exponent 
a > 0. 

The message M is overheard by the eavesdropper who 
knows the statistics of the source and the encoding and 
decoding functions / and h. However, s/he does not have 
access to the common randomness K. 

As before, the relevant secrecy metric is the probability 
of a successful guess, i.e., a guess V n = g(M) satisfying 
d e (X n ,V n ) < D e . The optimal guess is determined, again, 
by the MAP rule g a . 

We assume (A1)-(A3) (given in Section|Hl|> hold throughout 
this section. We further assum^3 
(A4) R> R a := maxg.£)(Q||p)< a R(Q , D). 

4 For the primary user’s problem to be feasible, it is necessary to have 
R > maxQ. D (Q|| P) < Q R(Q, D). 


Let D = (D, D e ) and ^ = ( R,r ). For a given DMS P, 
distortion vector D, rate vector R , and reliability exponent 
a, we denote the optimal exponent by E(P,l5,R,a), i.e., 

E(P, a) = 

lim maxmin—-logPr (d e (x n , g n (f n (X n , K)) \ <P> e ), 
n ^°° {/nl {Sn} n V V / / 

(23) 

where {/„} is restricted to the class of functions ensuring the 
feasibility of the primary user’s problem. Similarly to ([23]), 
we define E~ (P,~fi and E+(P,~fi,1$., a) using the 
lim inf and lim sup, respectively. 

We extend the definition of R(Px, D, D e ) to account for 
the rate constraint as follows. For a given distribution Px 
satisfying R(Px,D ) < R, 

R(P x ,R,D,D e ) = max R(P X Y,D e ). (24) 

Py | x: 

E [d(X,Y)]<D 
I{X;Y)<R 

Extending the properties of R(Px,D,D e ), we prove the 
following properties of R(P\, R, D, D e ) in Appendix |c| 
Proposition 6: In the following statements, D > D mm , 
D e > D e min^ an d a given pair ( Px,R ) satisfy R > 
R{Px,D). ’ 

(P4) For fixed Px, R(Px,R,D,D e ) is a finite-valued 
function of (//, I). D f ,). Moreover, for fixed D e , 
R(Px, R,D,D e ) is continuous in the triple (Px, R, D) 
over the set S = {( Px,R,D ) : Px £ Px,D > 
Anin, R > R(Px , D)}. 

(P5) R e (P x ,D e ) - R(P X ,D ) < R(P x ,R,D,D e ) < 
R(Px,D,D e ) < R e (P x ,D e ). 

The main result is given by the following theorem. 
Theorem 2: Under assumptions (A1)-(A4), for any DMS 
P, distortion functions d and d e with associated distortion 
levels D > /J mill and D e > D e va ; n , corresponding respec¬ 
tively to the primary user and the eavesdropper, and reliability 
exponent a: 

E(P, l5, i?, a) = 

min jp 0 (P. D e ),r ^min D(Q\\P)+R(Q, R, D , P e )|. 

(25) 


Remark 7: The minimization over Q is due to the im¬ 
position of an exponentially decaying probability of violat¬ 
ing the distortion constraint. If we replace it instead by 
Pr (d(X n , Y n ) > D) < 6, for some small 5, then the second 
term of ( |25[ would collapse to r + R(P , R, D 1 D e ). 

Remark 8: We can recover Theorem [T]by setting a = +oc, 
r = 0, and R = log |3^|- Weinberger and Merhav’s re¬ 
sult Theorem 1] can also be recovered by noting that 
the leniency assumption implies R(Q 1 R 1 D 1 D e ) = 0 for 
all Q. Moreover, for any D e > min„ e v Ep [d(X, u)] and 
r > 0, E(P, t, R, a) > 0. Indeed, the first condition implies 
R e (P,D e ) > 0, hence Pq(P. D f ) > 0. This refines Schieler 
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and Cuff’s observation (23) that any positive key rate drives the 
distortion at the eavesdropper to its maximal expected value 
with high probability. 


A straightforward but useful corollary of Theorem [2] is a 
necessary and sufficient condition on the key rate for the 
achievability of the perfect secrecy exponent. In particular, 

E(P,l5 , it,a) = E 0 (P,D e ) if and only if 
r>E 0 (P,D e )- min D(Q\\P) + R(Q, R, D, D e ). 

Q:D(Q\\P)<a 

(26) 


Let ro be the minimum rate needed to achieve Eo(P,D e ). 
The condition in (26) is interesting in that it allows ro to be 
strictly less than Eq(P, D e ), which itself satisfies Eq(P, D e ) < 
Re(P, De)- 

Remark 9: One might suspect that r > 
maxQ.£)(Q||p)< Q R(Q, D) is sufficient to achieve Eo(P,D e ), 
since we can use good rate-distortion codes for each type and 
the number of available keys is large enough to completely 
“hide” the source sequence within a type class. This is, 
indeed, true as it implies the condition in (26): 


D{Q ||P) + R e (Q, D e ) - D(Q\\P) - R e (Q, D e ) + R(Q, D) 

= R(Q,D), 

^mm[D(Q'\\P) + R e (Q',D e )]-D(Q\\P)-R e (Q,D e ) 

Q 

+R{Q,D) < R(Q,D), 

By Property (P5): mm[D(Q'\\P) + R e (Q',D e )\ - D{Q\\P) 

Q 


-R{Q,R,D,D e ) < R(Q,D), 
E 0 (P,D e ) + max [~D(Q\\P) - R(Q,R,D,D e )] 

Q:D(Q\\P)<a 

< max P(Q,P). 
Q:D(Q\\P)< a 


min D(Q\\P)+mm{r+R(Q 1 R , D , D e ),R e (Q 1 D e )}}. 

Q:D(Q\\P)<a J 

(27) 

The primary user will operate as follows. For low-probability 
types Q, particularly Q ’s with D{Q\\P) > a, the transmitter 
will send a dummy message. This is feasible because 
we allowed some probability of violating the distortion 
constraint. For such Q’s, the eavesdropper receives no 
information. Therefore, the guessing exponent conditioned 
on Tq is given by R e (Q,D e ), yielding the second 
term of (27) . For Q ’s satisfying D(Q\\P) < a, let 
E0,^,Q) = min {r + R(Q,R,D,D e ), R e (Q,D)}. This 
can be understood as the exponent conditioned on X n £ Tq. 
For each such Q, we associate a joint type induced by a Py\x 
that achieves the maximum in (24) . Similarly to Section 
we use this joint type to generate a rate-distortion code. This 
roughly corresponds to the term R(Q, R, D, D e ). To take 
advantage of the secret key, we in fact produce 2 nr such 
codes, and use the key to randomize the choice of the code, 
yielding the additional r term. Since the eavesdropper can 
always guess blindly and achieve the exponent R e (Q,D), 
we get min{r + R{Q, R 1 D 1 D e ), R e (Q,D)}. We will show 
in Lemma [9] that such random construction fails to achieve 
the desired exponent with only doubly exponentially small 
probability. 

As mentioned, the code construction for each type depends 
on the conditional Py\x achieving the maximum in (24) . 
A natural question arises: under what conditions does the 
optimal test channel (which we will denote by Py\ x ) achieve 
that max? One can readily verify that this holds when 
R{Q , P, D 1 D e ) = 0 (e.g., the eavesdropper’s constraint is 
more lenient than that of the legitimate receiver). We further 
investigate this question by considering special cases of The¬ 
orem u 


III-A 


The converse of Theorem [2] is based on the following anal¬ 
ysis. To achieve the second exponent in (25) , the eavesdropper 
tries to guess the value of the key and then applies the scheme 
suggested in the previous section. Taking into consideration 
the rate constraint, the term R(Q, D, D e ) which appears in (6) 
is replaced by R[Q, R, D, D e ). Also, taking into account the 
modified distortion constraint, the minimization over all Q’ s 
which appears in (6) is replaced by a minimization over Q’ s 
satisfying D(Q\\P) < a. The first exponent is the perfect 
secrecy exponent (given in (7}), which the eavesdropper can 
achieve even in the absence of any observation. The fact 
that one of these two schemes achieves the optimal exponent 
implies that the eavesdropper does not benefit from guessing 
only part of the key. Either s/he guesses the entire key 
correctly and proceeds, or s/he makes a completely blind 
guess. Interestingly, a similar observation has been made by 
Schieler and Cuff [25 ] in the context of minimum expected 
distortion over a list. 

To describe the achievability result, it is helpful to 
rewrite (25) as: 

E(P,t,l,<*) = min { min D(Q\\P) + R e (Q, D e ), 

l Q-.D(Q\\P)>a 


A. Applications of Theorem [2] 

In the following, assume a = +oo. Hence, R > 
maxQ R(Q,D). 

1) Perfect Reconstruction at the Eavesdropper: Suppose 
V = X, and the eavesdropper is required to reconstruct 
the source sequence perfectly, i.e., the secrecy metric is 
Pr(V n = X 71 ). In our formulation, this is equivalent to setting 
d e to be the Hamming distance and D e to 0. Then, for each 
Q, we get 

R(Q,R,D, 0) = max R(Pxy, 0) 

Py \x- 

E [d(X,Y)]<D 
I(X-Y)<R 

= max H(X\Y) = H Q (X) - R(Q, D). 

Py | x : 

E [d(X,Y)]<D 
I{X-Y)<R 

Note that the maximum is achieved by the optimal test 
channel, and the exponent is given by 

E(P,1$,1$) = mmD(Q\\P)+ 

Q 

min {r + H Q (X)-R(Q,D), H Q (X)}, 
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where we have used the equivalent form ( |27] . Note that, in 
contrast to R(Q, R, D, D e ) = 0, this case corresponds to a 
more lenient constraint at the legitimate receiver, which leads 
us to our next example. 

2) Binary Source with Hamming Distortion and D e < D: 
Suppose X = y = V = {0,1}, d and d e are both the 
Hamming distance, and D e < D < 1/2. We prove the 
following lemma in Appendix [D] 

Lemma 7: If D e < D < 1/2, 

R(Q,D,D e ) 

= R e (Q,D e )-R{Q,D) 

( 0, H(Q) < H(D e ), 

= iH(Q)-H(D e ), H(D e )<H(Q)<H(D), 
[H(D)~H(D e ), H(Q) > H(D). 

It follows from property (P5) of Proposition [ 6 ] that 

R(Q, D, D e ) = R e (Q , D e ) - R(Q, D ) 

=► R(Q, R, D, D e ) = R e (Q, D e ) - R(Q, D ). 

Therefore, the exponent is given by: 

E(P,t,t) = min{Pi, E 2 , £ 3 }, 

where 

Ei = min {d(Q\\P): H(Q)<H(D e )}, 

E ‘2 = min {D(Q\\P) + H(Q)-H(D e ) : 

H(D e )<H(Q)<H(D)}, 

and 

= min {d(Q\\P)+ 

min {r + H(D) - H{D e ), H(Q) - H(D e )} : 
H(Q)>H(D)}. 

If X ~ Ber(l/2), then D(Q\\P) = 1 — H(Q), and the minima 
corresponding to the first two cases reduce to 1 — H{D e ). The 
third minimum can be computed as follows: 

min 1 — H(D e ) + min{r + H(D) — H(Q), 0} 

H{Q)>H(D) 

= 1 - H{D e ) + min{r + H(D) -1,0}. 

Therefore, 

pvpTj r > 1 — H(D), 

’ ’ \r + H(D)-H(D e ), r < 1 — H{D). 

The resulting expression when r < 1 — H(D) admits a 
simple geometric explanation, shown in Figure [3] below. Upon 
observing the public message, the candidate source sequences 
are clustered into 2 nr balls. Each ball corresponds to a possible 
value of the key K, and has volume 2 n ,, ' l>! since it is 
the pre-image of a possible reconstruction at the legitimate 
receiver. For the eavesdropper, the maximum volume of the 
ball that s/he can generate to “engulf” candidate sequences 



Fig. 3. The dots represent sequences in a type class Tq. Each of the 
2 nr non-dashed circles represents a Hamming-distortion ball of radius D, 
corresponding to a possible reconstruction at the legitimate receiver. Thus, dots 
within the circle (in blue) represent candidate source sequences. The dashed 
circle represents the distortion ball of radius D e around the eavesdropper’s 
reconstruction, and it fits entirely in a non-dashed circle. 


is 2 rijr,l:, 'K Due to the structure of Hamming distortion, 
this maximally-sized ball can fit entirely into any one of 
the clusters, so that the probability of a successful guess is 
2 nH(De) 2 ~ n ( r + H ( D )). Note that the geometric interpretation 
assumed that we are using good rate-distortion codes (to get 
pre-images of volume 2 n/,, ' /);i ). The described structure is 
also reminiscent of successive refinement {29]. These can be 
explained by the following lemma. 

Lemma 8: If R(Q , D , D e ) = R e (Q, D e ) - R(Q, D), then 
the optimal test channel Py\x achieves the maximum in {5}. 
Moreover, (/ is successively refutable from D to D e . 

Proof: Consider the proof of the lower bound in (P3) 
of Proposition [T] R(Q, D, D e ) = R e (Q,D e ) — R(Q,D) 
i is 


implies that 


the maximum. Moreover, 

j(i) 


an equality. Hence, Py\ x achieves 


Py\ X Y the minimizer in ( [55] , and Py\ XY the minimizer 


becomes an equality. Let 

( 2 ) 


in 


H r 


V\XY 

Then, 7T p * (1) (X\V) < H p < m (X\V) = 

^ XY Jr V\XY r XY^VIXY 

>w.JX\V,Y) < H pt p(}} JX\n 


1 XY* V\XY 


Therefore, P, 
H r 


(i) 


VIXY 


dC 1 ) 

1 XY ± V\XY 


(X\V,Y) = H 


XY - 1 V|XT 

satisfies E [d e (X,V)\ < D e and 

( 1 ) (X\V), i.e. the Markov 


QP 


chain X — V — Y holds. Finally, note that I{X\V) = 
I(X-(V,Y)) = I(X-Y) + I{X-V\Y) = R e (Q,D e ), 
implying that Q is successively refutable from D to ■ 


B. Achievability Proof 

We show that P _ (P, > mm{Eo(P,D), r + 

minQ.£)(Q||p)< a D(Q\\P) + R(Q, R, D : D e )} by demonstrat¬ 
ing an encoding-decoding strategy for the primary user that 
achieves the given exponent. 

As before, the primary user will operate on the source 
sequences on a type-by-type basis. The result is driven by the 
following lemma, which is based on the analysis of Schieler 
and Cuff {251 and the proof of which is given in Appendix [E] 
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Lemma 9: Let e > 0 , n g N, Qxy £ Qxy be given. 
Let N be an integer such that 2 u ( Iq xy ( X;r )+ 2e / 3 ) < N < 
2 n(i Qxy (x-,Y)+e)' Generate a code C n = (Yf 1 , Y 2 n ,..., Yfi) 

by choosing N elements independently and uniformly at 


random from Tq y . 

1) Covering: For x n £ Tq x , define 

C{x n ) = {m £ [N] : (x n ,Y£) £ T QxY }, (28) 

N x . = \C(x n )\, (29) 

and the event 

£ = { C n : there exists x n £ Tq x such that N x n = 0 
or N x n > 2 2 " 6 }. (30) 

Then, there exists n i(e, \X\, |3^|) (independent of Qxy ) 

such that, for all n > m, 

Pr(£) < e~ 2nc/7 . ( 31 ) 


2) Guessing—single code: Suppose ~ Unif (Tq x ) 
and C n ^ £. Let P^ xn be as follows. Given x n , M is 
chosen uniformly at random from C ( x n ). Then, for all 
n > ni, for all v n £ V™ and all m £ [JV], 

Pr(d e pT\ u n ) < D e \M=m,C n ) < 2 - n C R «*v,AJ-4e) ) 

(32) 

and 


E[Pr(d e (X”,n n ) < D e |M = TO,C n )|£ c ] < 

2-re(fl e (Qxit) e )-4e) (33) 

where the probabilities are computed with respected to 
the randomness in P^i X n, and the expectation with 
respect to the distribution of the code C n . 

3) Guessing—multiple codes: Let K be uniform over K, = 
[2 nr ], r > 0, and independent of X n (which is uniform 
over T Qx ). For each k £ [2 nr ], generate C£ as described 
above. Define P^ X n x as follows. Given k, if Ck £ 
£, then M is chosen uniformly at random from [N] 
independently of X n . If C[! ^ £, then M is chosen 
uniformly at random from Ck(X r ). Let, 


£ = •|{Cfc }^_ 1 : there exists m £ [N] such that 

max Pr (d e (X n ,v n ) < D e \M = m,{C£}lZ> 
v n GV" V / 

2 ~ n(min{R e (Qx ,D e ),r+R(QxY ,D e )}— 8e) ^ 


(34) 


where the probability is computed with respect to 
Pm |X” K m Then, for all n>n\, 

Pr(£) < e~ 2ne/9 , (35) 


where the probability is computed with respected to the 
distributions of the codes {C£}fc. □ 


The first part of the Lemma asserts that if we generate the 
codebook randomly, then each x n £ Tq x will be covered by 
a small number of codewords (the probability that this event 
does not occur is doubly exponentially small). Therefore, if 


we encode x n by choosing a codeword uniformly at random 
from its cover, the induced P x ™\ Y ™{-\y n ) will be roughly 
uniform over the set T QxlY (y n ) = {x n : (x n ,y n ) £ T QxY }. 
Consequently, given a codeword index m and v n £ V™, the 
second part bounds the probability that v n covers X n , and 
also bounds the expectation (over the choice of the codebook) 
of that probability. 

Finally, the third part considers generating 2 nr codebooks 


and the induced distribution P 


X n \M-. 


where M is the index 


of a chosen codeword. This distribution roughly corresponds 
to generating 2 nr elements uniformly at random from 
Tq y , revealing the chosen elements to the adversary, then 
choosing one of them uniformly at random and generating 
X n uniformly at random from Tq x[v (Y n ). This setup is 
similar to the one studied by Schieler and Cuff 125 Theorem 
4], Equation ( |35j ) states that, for most realizations of the 
codebooks, the probability that the adversary generates a 
successful guess, given a codeword index, is upper-bounded 
by 2 (Ox,r+fl(Q X y ,c e )} i q^e implication is that 
the best the adversary could do is 1 ) either ignore the index 
and guess X n blindly, 2) or guess which codebook is being 
used (i.e., guess the value of the key I\) and use the scheme 
suggested in the previous section. 


Now, fix 6 > 0 such that R a +s = 

maxQ.£>(Q||p)<a+a R(Q, D) < R. Note that such 5 exists 
since lim^o R„+s = Ra (which follows from Proposition [13] 
in Appendix [A] and the fact that D(Q\\P) is convex). Fix R' 
such that R^s < R' < R. and e > 0 such that e < R — R!. 
Let 


Q n x (a 1 S) = {Q&Q n x :D(Q\\P)<a + 6}. (36) 

Let n be large as given by Lemma [9] For each type 
Qx £ 2J(n,^), we associate a joint type Qxy and generate 
2 nr codebooks {Ck}k =1 £ £ c where the size of each codebook 
is upper-bounded by 2 "(kxi'( 1 i y )+ £ ) (the existence of such 
codes follows from ©)■ Since the primary user wants to 
minimize the probability of a successful guess by the eaves¬ 
dropper, but must also satisfy a rate constraint, the associated 
type is chosen as follows: 

Qr<(Qx) £ argmax R{Q XY ,D e ). (37) 
QXY&Qxy(Qx <D) : 

IQxy (X\Y)<r! 

The encoding function / is as follows. Given a source 
sequence x n satisfying Q x n g Q x (a, 6), and a realization 
of the key fc, a reconstruction sequence is chosen uniformly 
at random from Ck(x n ) (cf. (|28[i). The associated message is 
then given by: 

• flog I Qx 11 bits to describe Q X - 

• [log |Cj?|l bits to describe the index of the reconstruction. 

The legitimate receiver uses the first part of the message 
and the key to determine which codebook is being used, 
and then uses the second part of the message to recover 
the reconstruction Y n . Finally, all sequences x n such that 
Qx n 4- Qx( a ’^) are ma PP e d t° an arbitrary message toq. 
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Remark 10: One can check that this encoding is feasible 
by noting that the required number of bits satisfy: 

riogiQ£ii + po g |cj?n < 

\X\ log(n + 1) + 1 +n (Jq* rI (q x )(X‘, Y) + ej + 2 < 

n (R! + e + log(n + 1 ) + - ] < nf?, 

\ n n J 

for n large enough. Moreover, it satisfies the excess distortion 
probability constraint, since 

Pr {d(X n ,Y n ) > D) < E P(Qx) 

Qx£Qn(a,i 5 ) 

y ' 2~ nD (Qx\\P) 

Qx£Qn(a,i 5 ) 

< (n + l)l*l2- n (“+' s ) < 2~ na , 

where the last inequality holds for large enough n. 


min D(Q X \\P) + R e (Qx,D e ) ^ + 8en 

Qx£Qx(a,5) 

= (n + 1)^1 exp \—n min ^ min D(Q x \\P) + r+ 

QxSiQxi “>< 5 ) 

R(Qw(Qx),D e ), 


min D(Q x \\P)+Re(Qx,D e )}+8en), (38) 

QxeQx J j 

where (a) follows from (|34] i of Lemma [9] and the fact that the 
codebooks {Cfc }^ =1 ^ £ by construction. Therefore, 

E~ (P, if, a) 

= lim inf max min-log Yv(d e (x n 1 g n (f n (X n , K))) < 1 

n ~>°° {/„} {s„} n V V / 

> min|p 0 (P,£)), 

r+ min D(Q\\P) + R(Q, R', D, D e l- 8 e, 

Q-.D(Q\\P)<a+S ') 


To analyze the performance of the eavesdropper, note that 
when s/he observes a message m ^ mo, then the induced 
distribution Px n \M=m is exactly the setup studied in part three 
of Lemma [9] Indeed, the message m indicates the type of 
the transmitted sequence and the index of the reconstruction 
(among 2 nr possible codebooks). For m = mo, i.e., for 
sequences of type outside Q n (a,6), the performance can still 
be analyzed in light of Lemma [9] by considering the associated 
Qxy to be of the form QxQy (i.e., X and Y are indepen¬ 
dent), in which case mm{R e (Q x , D e ),r + R(Q X y, D e )} = 
min {R e (Q x ,D e ),r + R e (Q x ,D e )} = R e (Q x ,D e ). Now, 
consider the following chain of inequalities. 

Pr (d e (x n ,g 0 (f(X n ,K))) < D e ) 

= E E E P^ n )Pf(m\x n )- 

Qx&Qx i"6T Qx meM 

1 {x n € B Dc ( 3 o (m))} 

= E E E Pf(m\T Qx )P f (x n \m,T Qx ). 

Qx^Q 7 ^ m€A4x n £TQ x 

l{x n e B De ( 50 (m))} 

= E p (^) E p /M%)• 

QxtQx Jy\ 

max P f (d e (X n ,v n ) < D e \m,T Qx ) 

V n^yn 

< E p (Q*) E p m\ t q*)- 

QxEQx me M 

2 —n(min{R e (Qx,D e ),r+R(Q* R , (Qx),De )} —8e) 


< E 2 -n(D(Qx\\P)+niin{Re(Qx ,De),r+R(Q* H > (Qx),D c )}— 8e) 
Qx&Q-x 


< 


(n + l)^ max exp { - n (D(Q X \\P)+ 

QxCQ# 

min {R e (Q x ,P>e),r + R(Q r > (Qx), D e )} - 8 e)| 
= (n + 1)1*1 exp (— nmin < min D(Q X \\P)+ 


Qx£Qx(_ct,6) 

mm{R e (Q Xl D e ), r + R(Q r ,(Q x ), D e )}, 


where the inequality follows from the following proposition, 
the proof of which is given in Appendix [F] 

Proposition 10: 

lim min D(Q X \\P) + R(Q* r ,(Q x ), D e ) = 

n ~>°°Qx6Qx( a,S) 

min D(Q\\P) + R(Q, R', D, D e ). 

Q:D(Q\\P)<a+S 

Now, note that e is arbitrary, and 

lim min D(Q\\P) + R(Q, R\ D, D e ) = 

R'^RQ:D(Q\\P)<a+S 

min D(Q\\P) + R(Q,R,D,D e ), (39) 

Q:D(Q\\P)< a +S 

since R(Q, R, D, D e ) is uniformly continuous in ( Q,R ) over 
the set {(Q,P) : D(Q\\P) < a + 5,R' < R < R} 
by Proposition [ 6 ] Finally, it follows from Proposition [ 6 ] and 
Proposition p~3] (to follow in Appendix [A| that 

lim min D(Q\\P) + R(Q, P, D, D e ) = 

S^0Q:D(Q\\P)< a +6 

min D(Q\\P) + R(Q,R,D,D e ) (40) 

Q:D(Q\\P)<a 

As such, 

E~(P, if, if, a) > min |£’o(P, D), 

r+ min D(Q\\P) + R{Q, R, D, D e )\ . 

Q:D(Q\\P)<a 
C. Converse Proof 

We now prove that P + (P, < min{r + 

nnnQ. D (Q||p)< a P(Q||P) + R(Q, R, D, D E ), E 0 (P, D e )}. 
We have already shown, following Theorem [T] that the 
perfect secrecy exponent Eo(P,D e ) is achievable by the 
eavesdropper even in the absence of any observation. It 
follows immediately that 

P+(P,^,fta)<P 0 (P,P e )- (41) 

So we only need to demonstrate a strategy that achieves 
the first exponent. The strategy is based on the one suggested 
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in Section |III-B| We will add an initial stage in which the 
eavesdropper tries to guess the value of K, by choosing 
an element uniformly at random from {1,2,-•• ,2 nr j. The 
eavesdropper’s guess, denoted by K, is equal to K with 
probability 2~ nr (This will correspond to the r term in (25)). 
Then, s/he generates Y n = h(M, K). Next, the eavesdropper 
implements the same stages suggested in Section [III-B| where 
Y n plays the role of Y n . We denote the strategy by </. 

Remark 11: If h is stochastic, it can be replaced by a 
deterministic h that still satisfies the reliability constraint. 
Since this does not change the conditional Px^\m-> we can 
assume, without loss of generality, that h is deterministic. 

Now, consider any functions / and h implemented by the 
primary user (and satisfying the distortion constraint). Let Pf 
denote the induced joint probability of (X",M, K), and I’k 
denote the distribution of K. To analyze the performance 
of g ', note that, unlike Section III not every realization of 
Y n necessarily satisfies the distortion constraint. To that end, 
define 


Md{x u , k) = {to £ M : d(x n , h(m, k)) < D}, 

x n £ X n , k£ 1C, (42) 

and A = {{x n ,y n ) £ X n x y n : d{x n ,y n ) > D}. (43) 

The distortion constraint implies that 

P f (A) < 2~ na . (44) 

Moreover, the analysis of g' should take into account the 
rate constraint R. The following Lemma by Weissman and 
Ordentlich (30) will be instrumental. 

Lemma 11 ( (30) Lemma 3]): Let Y n (.) be an n-block 
code of rate < R. Then, for every Q £ and g > 0, if 
X n is uniformly distributed over Tq, 

Pr ({x n £ T Q : Jq. BVB( . B) (X; Y) > R + g}) < 

(n + l)WW+W2- n '’. (45) 

Remark 12: This is not the exact statement found in (30) , 
but it is a straightforward modification. 

So, define for every g > 0, x n £ X n , and k £ 1C, 

M R (x n ,k,g) = {to £ M : (X; Y) <R + g, 

where y n = h(m, k)}, (46) 

M DyR (x n ,k,g) = M D (x n ,k)r\M R (x n ,k,g), (47) 
and B{g) = {(a : n ,y n ) £ X n x y n : 

I Q ^ yn (X;Y)>R + g}. (48) 

Finally, fix e > 0, S > 0 and g > 0, and consider the 
following chain of inequalities. 

Pr (d e (x n ,g'(f(X n ,K))) <D e ) 

= E EE P(x n )P K (k)P f (m\x n ,ky 

x n ^X n /cG/C mG/VI 

Pf (x n £ B Dti ( g'{m })) 

> E E E P(x n )P K (k)P f (m\x n ,ky 

x n ^X' n k£JC m(z.J'YiD,R{x ri ,k,rj) 

Pf (x n £ B Dc (V(m))) 


= E E E P(x n )P K (k)P f (m\x n ,ky 

x n ^X n k£JC m£A4 d,R.( x n ,k,ri) 

Y J Px(k = k) P f ( x n £ B Dc ( g(h(m , *)))) 
fee;c 

— ‘2~ nr J2 E E P{x n )P K {k)P f {m\x n ,k)- 

x n £X n fcG/C m£AiD, r( xTI , k,rj) 

Pf ( x n £ B Da ( g{h{m,k )))) 

( a ) 

> 42 -™- E E E P{x n )P K {k)P f {m\x n ,k)- 

x n £X n k£JC mGMo,R(x' 1 ,k,r]) 

2 


> c ' n 2~ nr ZEE E P{x n )P K {k\ 

Qx £ Q n x xTI £Tq fcG/C mGAIo ,R(x n ,k,rj) 

Pf(m\x n ,k) min 2 - " ip n (Q xr)*XXI 1 ) 

QxY&Qxy(_Qx,D)-. 

Iq xy (X;Y)<R+ v 

= c' n 2~ nr V min 2~ nIp XQxY) ( - x X\ Y l. 

QxveQZyiQxP): 

I Qxy (X-,Y)<R+ti 

P f (A c nB c {g)nT Q ) 

> c' n 2~ nr V min 2 -n(R(Q XY , D .)+e). 

n ,,QxYeQxy(Qx,D)-. 

Qx€Q x (a,—S) Iqxy (X;Y)<R+ v 

P f (A c nB c (g)nT Q ) 

> c' n 2~ nr E 2 ~n(R(Q x ,R-\-ri,D,D e )+e) # 

Qx£Qx( a ’— 

Pf{A c nB c (g)nT Q ) (49) 


where 

(a) follows from Lemma [ 5 ] and (42) : to £ M.D{x n ,k) 
guarantees that d(x n , h(m, k )) < D so that Lemma[5]is 
applicable. 

(b) follows from (47): m £ Md,r{x u , k,g) allows us 
to restrict the minimum to Qxy £ Qxy{Qx,D) 
satisfying I Qxy (X; Y) < R + g. 

(c) follows from Proposition [14] in Appendix [B] 

Now, note that 

Pf(A\T Q ) < 1 EE <(n+ i)\x\ 2 -n(a-DmP))_ (50) 

pQq) 

Moreover, by Lemma El 

P f (B(g)\T Q ) < {n + l) lxlw+lxl 2~ nri . (51) 


Combining (50) and (5l) yields, for every Qx £ (o:, — 6 ), 


P f (A c nB c (g)nT Q ) 

= P(T Q )Pf{A c PB c {g)\T Q ) 

> P(Tq)( 1 - Pf(A\T Q ) - P f (B(g)\T Q )) 

> P{T q ){ 1 - (n + iy x ^2~ nS - (n + 1 )l*l|y|+l^l2-’*’») 

> P(T Q )/2 , (52) 


where the last inequality holds for large enough n. Continuing 
from (49) , we get 

Pr (d e (x n ,g\f{X n ,K)))<D e ) 
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>(c'j2)2-">+l)-l*l- 

2 ~n(D(Q x \ \P)+R(Qx,R+'n,D,D c )+e) 


E 

Qx GCJ-(a,—(5) 


Remark 13: The proposition generalizes Lemma 2.2.2 
in (28) , which shows the continuity of the regular rate- 
distortion function, and the proof follows along similar lines. 


> (c'„/2)(n + 1) 1*1 exp j - n^e + r+ 


min D(Q X \\P) + R(Qx,R + r], D, D e ) 

Qx€Qx ( a >—3) 


(53) 


Therefore, taking the limit as n goes to infinity, and noting 
that e, S, and r/ are arbitrary, we get 

E + (P 7 ~6^ 7 a) = 

lim sup max min - - log Pr (d e (x n ,g n {f n {X n ,K))\< £> e ) 
< lim lim r + min D(Q\\P)+R(Q, R + y, D. D e ) 

S- X)??-s-0 Q:D(Q\\P)<a-S 

= r+ min D(Q\\P) + R(Q,R,D,D e ), (54) 

Q:D(Q\\P)<a 

where the last equality follows similarly to equations ( |39| ) 
and (|40|. Combining (|4Tj) and (54)> yields our result. 


Appendix A 

Proof of Proposition)]] 

A. Proof of Property (PI) 

(PI): For fixed Pxy, R(Pxy, De) is a finite valued, non¬ 
increasing convex function of D e . Furthermore, R(P X y, D e ) 
is a uniformly continuous function of the pair (Pxy- D e ). 

Fix Pxy■ The minimization in 0 is over a compact 
set, which is non-empty due to assumption (A3). Since 
/(A'; V\Y) is a continuous function of Py\x,Y , the minimum 
is achieved. The monotonicity in D e follows directly from 
the definition. It is easy to check that I(X;V\Y) is convex 
in Py\x,Y for fixed Pxy- Then, the proof of the convexity 
of R(Pxy ■ D e ) in D e follows similarly to the case of the 
rate-distortion function with no side information (see Lemma 
2 .2.2 in (28)). 


To show the uniform continuity in the pair (Pxy, De), 
consider the following proposition, the proof of which is given 


in Appendix G-A 


Proposition 12: Let Ni and N- 2 be in N, and let S and 
U be compact subsets of IR^ 1 and R^ 2 , respectively. Let v 
be a non-negative continuous function defined on S xW, and 
let •& be a real-valued continuous function defined on S x U. 
Suppose they satisfy the following condition: 

(PA) If (s, ui) £ S x U satisfies v(s, u±) = min„/ e ^ i/(s, u'), 
then there exists u 2 such that d(s 7 u 2 ) = 'd(s,u 1 ), and 
for all s' € S, v(s' ,u 2 ) = rmn u ' e u v(s', u'). 

Let to = max se 5 min„ e ^ v(s , it), and let p be a function on 
S x [to, + 00 ) defined as follows: 


tp(s 7 t) = min i?(s, u). 

u:is(s,u)<.t 

If for fixed s € S, ip(s,t) is continuous in t, then <p(s,t) is 
continuous in the pair (s,t). 


The proposition yields immediately the continuity of 
R(P X Y,D e ) by identifying S with Vxy, U with the 
set of conditional probability distributions Py\xY, to with 
D e> min, and the functions v, 1 ), and <p with E[d e (A”, V")], 
I(X\V\Y), and R(P\y, D e ) respectively. It is easy to 
check that min = maxp xr minp y|xy E[d e (A, V)] 
so that we can identify it with t 0 . To see why 
E [d e (X,V)\ and I(X\V\Y) satisfy (PA), note the follow¬ 
ing. For notational convenience, we write E[d e (X, V)] as 
d e (P XY ,P V \xy), and I(X-V\Y) as I(P X y,P V \xy)- Sup¬ 
pose d e (P X Y, Pv\xy) = mmp vixY d e (P X Y,Pv \xy) and let 
D e (x ) = min t , e v d e (x, v) for x £ X. Then for all (x,v) 
such that d e (x,v) > D e (x), Pxv{x,v) = 0. Expanding 
Pxv(x,v ): 

Pxv(x,v) = p XY(x,y)P v \xY(v\x,y) = 0 for all 
y e y, P X y(x, y) = 0 or P v \ XY (v\x,y) = 0. Then, define 
Py\ X Y as f°ll ows: 

* If Pxy(x, y) > 0, let P v ^ x _ x Y = y ) = P\'\(x=x,Y=y)- 

. If P XY (x,y) = 0, let Py\ {x=xY=y) satisfy 

P V\(X=x,Y=y)( V \ X ’y) = 0 if de{x,V ) > D e (x). 

Then PxyPv\x,y = p XY p y\ X Y , thus I(Pxy, p v\xy) = 
I( p XY, P y\ XY )- Moreover, the definition of Py\ XY guaran¬ 
tees that d e (x,v) > D e (x) => Py^ XY (v\x, y) = 0 for all y. 
Therefore, for any joint distribution P' XY , d e (P XY , P y\ XY ) = 

min P v ixr d ^ PxY ’ P v\ xy)- 

Finally, to prove uniform continuity, note that 
R( p xy, D e ) = R(P X y,D eimax ) for all D e > D e , max . 
Therefore, R(P XY , D e ) is uniformly continuous on the set 
Vxy x [D e , max,oo). Since it is also uniformly continuous on 
Vxy x [If e ,minj-De.max], the result is established. ■ 

B. Proof of Property (P2) 

(P2): For fixed P x , R(P x ,D 7 D e ) is a finite-valued func¬ 
tion of (D,D e ). Moreover, for fixed D e , R(P X , D, D e ) is a 
uniformly continuous function of the pair (P X ,D). 

Fix P X - The maximization in 0 is over a compact 
set, which is non-empty due to assumption (A3). Since 
R( p XY,D e ) is a continuous function of P XY , it is also 
continuous in P Y \ X for fixed p x • Therefore, the maximum 
is achieved. 

As for the continuity in (P X ,D) for fixed D e , we view 
R(P X , D, D e ) as a function of ( P X ,D ), and R(P XY , D e ) as 
function of (P X , p Y \x)- In the terminology of Proposition [ 12 ] 
we identify S with V x , U with the set of conditional proba¬ 
bility distributions p y\x, to with -D m i n , and the functions v, ■&, 
and i p with E[d(A, Y)], —R(P XY , D e ), and — R(P X , D, D e ) 
respectively. Proving that E[d(A”, Y)] and R(P XY , D e ) satisfy 
(PA) follows along the same lines as proving E[d e (A, V)\ 
and I(X-,V\Y) satisfy (PA) . Moreover, if continuity holds, 
uniform continuity follows from the fact that R(P X , D, D e ) 
is constant for all D > D max . 

It remains to show that R(P X , D■ D e ) is a continuous func¬ 
tion of D for fixed P x and D e . The result of Proposition [T2| 
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then applies immediately. To this end, consider the following 


proposition, the proof of which is given in Appendix G-B 


Proposition 13: Let N be in N, and let T be a non-empty 
compact subset of R^. Let L be a real-valued continuous 
function defined on T. Let T\ D 5 " ' be a decreasing se¬ 
quence of non-empty compact subsets of T. Let T = f} (>1 P,- 
Then, 


lim rnaxL(f) =maxL(f). 

k —^oo tGTfc t£.T 


Moreover, let Si C 52 C • • • be an increasing sequence of 
non-empty compact subsets of T. Let S = U*>i &i (where 
the bar denotes closure of the set). Then 


lim rnaxL(f) =maxL(f). 

k —>oo tG-Sfc tES 

Consequently, if T is also convex, and L c is a real-valued 
convex and continuous function defined on T with so = 

min tS 7 - L c (t), then 

Lis) := max Lit) 

t:L c {t)<s 

is continuous in s £ [so,+oo). 

It follows immediately then that R(Px, D, D e ) is continu¬ 
ous in D for fixed Px and D e , since E[d(X, Y)) is convex 
and continuous in P Y \x , and R(P\y, De) is continuous in 
P Y \x (for fixed P x ). 


C. Proof of Property (P3) 

(P3): R e (P x ,D e ) - R(P Xl D) < R(P Xl D 7 D e ) < 
Re(Px, D e ). 

The upper bound is straightforward since R(P XY , D e ), 
the rate-distortion function with side information, is always 
upper-bounded by R e (P x , D e ). The lower bound is derived 
by considering a conditional P Y | Y that achieves the rate- 
distortion function. 


R(Px,D , D e ) 

= max 
Pyix- 


min I{X ; V\Y) 


> 


E [d(X,Y)]<D E [d e (X,V)]<D c 

min Hp*(X\Y) - H p *p v . xy (X\V,Y) (55) 


*V| X,Y • 

E [d e (X,V)]<D e 


> min H PW (X\Y)-H Px r v JX\V) (56) 

^V\X,Y '• 

E [d e (X,V)]<D e 

= -H Px (X) + Hp* v (X\Y)+ 

min H Px (X) - H PxPvix (X\V) 

^V\X,Y : 

E [d e (X,V)]<D e 

= —R(Px,D) + R e (P X , -De)- 


Appendix B 

Proof of Proposition!!] 

First, consider the following proposition. 

Proposition 14: For all e > 0, there exists 

ri 2 (e, | X\, |y[, |V|), such that for all n > n 2 , for all 

D e f D e rn i n , foi each Qxy £ Qxy * 


mm 

PxYve 

Q , Lyv (QxY ,D e ) 


,(X-,V\Y)-R(Q XY ,D e ) 


< e. 


Proof: It follows directly from the definition that 

min I PxYV (X; V\Y) > R(Qxy, D e ). 

Pxyv€ 

Q n (Qxy ,D e ) 

So, we only need to show the other direction. To that end, let 
5 > 0 be small enough such that 


|| PxYV — P'xYvW < <5 =*■ 

I PxYV (X; V\Y) - I PkYv (X; V\Y) \ < e, (57) 


where || • || is used to indicate the L 2 -norm. Let n > 
n 2 > \V\^\XfyW\/5. Fix Qxy G Q n xy , and let P* |Ay 
be the conditional distribution achieving the minimum in 
R(Qxy- If). We construct a conditional distribution P y \ XY 
as follows. For each ( x , y) £ X x y, we will choose 
Py\x= X Y=y f rom Qy® XY ( x,v \ i.e., the set of rational PMFs 
over V with denominator nQ XY (x,y) (if Qx Y (x,y) = 
0, then we can choose Py\ x — X Y=y to an Y distribu¬ 
tion). This guarantees that Qx Y Pyi X Y * s i n Qxyv- Let 
v(x) = argmin^ gV d e (x, v) for x £ X (if more than 
one v achieves the minimum, choose one arbitrarily). We 
construct Py\ XY by rounding Pfi XY as follows. For each 
{x,y) £ X x V, for v v(x), we set Py^ XY (v\x, y) 
to be the largest integer multiple of l/(nQ XY (x, y)) that 
is smaller than Py\ XY ( v \ x i 2/), i-e., we round down with 
resolution l/(nQ XY (x, y)) and denote this operation by 
L-J nQxY(x,y)- Finally, we set P^ XY (v{x)\x,y) appropriately 
to make Py^ XY (.\x, y) a valid probability distribution. It is 
easy to see that, for such a choice. 


Pv\xy( v \ x iV) Pv\xy( v \ x iV) 


< 


|V| 


nQ XY (x, y)' 


Moreover, this readily implies that 


Qx Y Py\XY ~ Qx Y Py\XY 


< \VW\X\\y\\V\ <5 


n 


(58) 

Let P X YV = Qx Y Py\xY’ an< ^ P'xYV = Qx Y Py\XY ■ ^ OW ’ 
note that 


D e >V PiYv [d e (X,V)\ 

= EE Qx Y (x,y)Py\ XY (v\x,y)d e (x,v) 

x,y v 

= EE Qxy(x, y) [Pv\xy( v \ x > v)\ nQ XY {x, v )de{x , v) 

x,y v 

+EE Qxy(x, y)- 

x,y v 

( P V\ X Y i v \ x >y) - l P V\XY( V \ X ’y)\nQxY(x,y)) d e (x,v) 

^EE Qxy(x, y) [Py\ XY y) J nQxY(x,v)de(x, v) 

x,y v 

+EE Qxy(x, y)- 

x,y v 

(yPy\X y( v \ X ’ y)~ IPv\Xy( v \ x i y) J nQxY(x,y)j d e {x, v(x)) 

= E P , xYv [d e {X,V)}. 
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Therefore, 


min I PxYV (X; V\Y) < Ip, (X; V\Y) 

ayyc 

Q n (QxY,D e ) 

<I PkYv (X-V\Y) + e 
= R(QxY,D e ) + e, 

where the second inequality follows from ( |57j ) and ( |58| >. 

■ 

Similarly, we have the following proposition. 

Proposition 15: For all e > 0, there exists 

n${e,\X\,\y\,d e ), such that for all n > n^, D > D m i n , 
A > A,min. and for each Qx £ Qx, 


max R(QxY,D e ) - R(Q x ,D,D e ) 

Pxy£ 

Sxy (Qx P) 


< e. 


The proof follows along the same lines as that of 
Proposition 14 and is thus omitted. ■ 


By the previous two propositions, for any given e > 0, we 
can set n large enough to satisfy 

|/ P * (Qx) (X; V\Y) - R(Q X ,D , D e )\ < e, 

for all Q x £ Q\- Therefore, 

min D(Q X \\P) + R(Q x ,D,D e ) - e < 

Qx£Qx 

min D(Q X \\P) + I p * {Qx) (X;V\Y) < 

Qx£Q. x 

min D(Q X \\P) + R(Qx,D,D e )+ e 

Qx£Q x 

By taking the limit as n goes to infinity, and noting that e is 
arbitrary, the proof is concluded. 


Appendix C 

Proof of Proposition^ 

A. Proof of Property (P4) 

(P4): For fixed P\, R(Px, R, A D e ) is a 
finite-valued function of ( R,D,D e ). Moreover, 
for fixed D ei R(Px,R,D,D e ) is continuous 
in the triple ( Px,R,D) over the set S = 
{( Px, R, D) : P x £ Vx,D > Anin, R > R{PxiD)}- 

Recall, for P\ satisfying R(Px, D) < R, 

R(P x ,R,D,D e ) = max R(P X Y,D e ). 

Py\x- 

E [d(X,Y)]<D 
I(X;Y)<R 

For fixed Px, let Sd,r = 

{P Y \x ■ E[d(X, Y)} < D, J(X; Y) < R). ' Then S D>R 
is compact, and non-empty since D > Ariin and 
R > R{Px,D)- Since R(Pxy, De) is a continuous 
function of Pxy (by Proposition [TJ, it is also continuous in 
Py\x for fixed Px ■ Therefore, the maximum is achieved. 

To prove continuity of R(Px, R, A A) in (Px, R, D), 
first consider the following claims. 

Claim 1: For fixed Px, D e , and D, R(P\, R, D, D e ) is 
continuous in R, where R £ [R(Px, D), +oo). 


This follows from the third part of Proposition [13] in 
Appendix [A|by identifying T with {Py\x '■ E[d(X, X)] < D} 
(which is compact, convex, and non-empty since D > An in), 
L c with I(X;Y) which is convex and continuous in Py\x, 
sq with R(P\,D), L with R(P\Py\x, De), and L with 
R(Px, R,D,D e ). 

Claim 2: For fixed Px, D e , and It, It(Px- R-1). f) e ) 
is continuous in D, where D £ [D(Px,R),+ oo) and 

D{P X ,R) '■= min JV|x . / ( x-y)<r E [d(X, X)] is the distortion- 
rate function. 

This follows from a similar argument. 

We are now ready to prove continu¬ 
ity in the triple (Px - R- A over S = 
{(Px,R, D) : Px £ Vx,D > Anin, R > R(Px,D)}. 

To that end, fix any ( P, R,D) £ S and consider any 
sequence ( Pk,Rk,Dk ) converging to (P, It. D). First, we 
show that liminffe^oo R{P k , Rk, A, A) > R(P, R, A A)- 
Consider any e > 0. By continuity of R(P , R , D , A) in R (f° r 
fixed P, D , and D e ), we can choose R' such that R(Px,D) < 
R' < R and R{P,R',D,D e ) > R(P,R,D,D e ) - e/2. We 
now consider two cases depending on the value of D. Let 
D 0 = min Pr|x E[d(X,X)]. 

If D > Do'- note that I)(P. R) is non-increasing in R, there¬ 
fore D'(P , R) < 0. Moreover, it is convex in R and /?( P. D) 
does not achieve its minimum (D(P, R(P, D)) = D > Df), 
hence D'(P,R(P,D)) < 0. Therefore, R > R{P,D) => 
D(P, R) < D. Now choose A such that D(P, R) < A < D 
and P(P, R', A. D e ) > R(P , R ', D, D e ) — e/2. Let Pf , x be 
a maximizer for P(P, R ', D ', D e ). 

If D = Dq'. set A = D and P' Y \ X be a maximizer for 
R{P, R', D, A)- Let D(x) = min y( zy d(x,y) for x £ X. 
Then //, x must satisfy the following property: for all (x, y) 
such that d(x,y) > D(x), P{x) = 0 or P Y ^ x (y\x) = 0. 
We can construct P Y \ X suc h that d(x,y) > D(x) 
P Y \x(y \ x ) = an£ i P( x ) 0 => Py\ x=x = Py\x=x' 

SUCh, PP'y | X = PPy| Y - 

We claim that P Y , X ' s feasible for the maximiza¬ 
tion in R(P k ,R k ,D k ,D e ) for sufficiently large k. Indeed, 
I(Pk',Py | X ) —*■ I(P;Py\ x ) < R' < R. Then for sufficiently 
large k, J(P;Py-| Y ) < lit■ Moreover, if D > Do, then 
E [d(P k ,Pf lx )\ E[d(P,P* |jJf )] <D'<D. Then for suffi¬ 
ciently large k, E [d(P k , Pf^ x )\ < A- Similarly, if D = Do, 
then E [d{P k ,P*^ x )\ = min Py|x E [d(P k ,P Y \ x )\ < D min < 
D k , where the first equality follows from the construction of 
Py | Y . So we get 

liminf R(P k ,R k , D k , D e ) > liminf R(P k Py ]x , D e ) 

k—foo k—foo 1 

= R(PPy\ x , A) 

= R(P,R',D',D e ) 

> R(P,R,D,D e ) - e, 

where the first equality follows from the continuity of 
R(Pxy , A) in Pxy ■ Noting that e is arbitrary, we get our 
first inequality. 

(k) 

On the other hand, let P Y / X be a maximizer for 
R(P k , R k ,D kl A) - Consider a sequence of integers {kj} such 
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that 


•> -^kj i Dkj i De) ^ lim Slip R(Pfc , Rfo , Dk , D e ) • 

k—foo 

Let _Py|Y be the corresponding subsequence of 
maximizers. Since the set of conditional distributions 
{Py\x} is bounded, { P-\ x } has a convergent 

subsequence P Y \x • Let P y\x he hs limit. We have, 

/(P;P* |A .) = lim^ 00 /(P;P^4 ) ) < lim ^ R k]e = R. 

Similarly, E[d(P,P* |Jf )] = lim^ E[d(P, P^ } )] < 

lim^^oo D kjf = Z). Therefore, 

R{P,R,D,D e ) > P(PPy| A ,P e ) 

= R ( Pk U P Y\X > ^e) 

= P ( Pk u ’ Pk ie ’ Pk Je ’ 

= lim sup P(Pfc ,R k ,D k ,D e ). 

k—> oo 


fi. Proof of Property (P5) 

(P5): R e (P x ,D e ) - R(P X ,D) < R(P x ,R,D,D e ) < 
R{P X ,D,D e ) < R e (P x , D e ). 

The upper bound follows straightforwardly from the defini¬ 
tion and (P3). The lower bound follows from the proof of (P3). 
Indeed, the bound in (P3) was derived by considering a condi¬ 
tional Py\ X that achieves the rate-distortion function. As such, 
this choice is feasible since Ip* ( A'; Y) = R(P X , D) < P. 

Appendix D 
Proof of Lemma[7] 

Note that the second equality follows simply from the 
evaluation of R e (Q 1 D e ) — R(Q : D). So we only need to show 
the first equality. 

Note that (P3) asserts that R(Q, D, D e ) > R e (Q,D e ) — 
R(Q , D), so we only need to show the reverse direction. 
Moreover, if H(Q) < H(D), R(Q,D ) = 0. It then follows 
from (P3) that R{Q , D, D e ) = R e (Q, D e ). It remains to show 
that, for Q satisfying H{Q) > H(D), 

R(Q, D, D e ) < P e (Q, D e ) - R(Q, D ). 


Remark 14: The following proof was suggested by the 
reviewer, and it significantly simplifies our previous proof. 

To that end, let Py\x satisfy E[d(X, Y)] < D, X = X © Y, 
V = V ® Y and consider 

min I(X;V\Y) 

P v[XY -.E[d(X,V)]<D e 

= min I(X ® Y; V ® Y\Y) 

P v[XY :E[d( X ®Y,V®Y)]<D e 

= min I{X- V\Y) 

Py\ xv :E[d(X,V)]<D e 

< min I(X;V\Y) 

Py^:E[d(X,V)]<D e 

V-X-Y 

< min I(X; V) 

Py^ x :E[d(X,V)]<D e 


(a) 

= [H(X) - H(D e )}+ < H(D) - H(D e ), 

where (a) follows from the fact that Pr(A > = 1) = 

E [d(X, Y)]<D. Therefore, 


R(Q,D,D e ) 


max min I ( X ; V I Y ) 

P Y \xK[d(X,Y))<D P v \ XY : 

E[d(X,V)]<D e 


< H{D) - H(D e ) = R e (Q : D e ) - R(Q, D), 


as desired. 


Appendix E 
Proof of Lemma[9] 

Proof of 1): For x n £ Tq x , and m £ [AT], let 


N 


N x ^m = l{{x n ,Y™) e Tq xy }, so that N x n = ^ iV x n, m . 

m =1 

Note that, N x n^ m ~ Ber{/ 3), where 


p = Pr((x n ,Y”)£T QxY ) = 


\t Qy \A xU )\ 

\Tq y \ ' 


=> 2~ n ^ lQ - 1 cY^ X ’ Y )+c/2) < p < 2~n(lQ XY (X-,Y)-e/2) 

Therefore, 

Pr (N x n = 0) = Pr(AT x n, m = 0,Vm £ [N]) 


= (1 “ 0 ) < e _/3JV < 


N 


(b) 


e- 2 "' /6 , (60) 


m =1 


where (a) follows from the independence of N x n. m for differ¬ 
ent m’s, and (b) follows from the fact that (1 — t) N < e~ tN . 
On the other hand. 


N 


Pr(-/V x n > 2 2ne ) =Pr ^ N xn<m > 2 2 


\m =1 


<(e2-” /2 ) 2 ’", (61) 


where (a) follows from the Chernoff bound (cf. |25| Lemma 
2]). Using equations and and the union bound, we 
get 

Pr(£) < \X\ n ( (e2 —/ 2 ) 22 "' +e- 2 ” e/6 ) < e- 2 ^ 7 , 

establishing ©• ■ 

Proof of 2): To show that ( |32| ) holds, consider C n f £, and 
( x n ,m ) where m £ C{x n ), 

r PM\X n ( m \x n ) 

P xn]M (x n \m) = — --- — c - p— 

Lx»6T q r M\X^\ m \ x ) 


< 


*x\ Y v 

1 


— 2 n ( H Q XY (X\Y)-e)2~2ne 
= 2~ n ( H Qx Y ( X \ Y )- 3 c)' 


( 62 ) 


Then, 

Pr (d e (X n ,v n ) < D e |M = m,C n ) 

= E 

x 71 : d e ( X n , V n ) < D e 
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< 


E 2 


-n(H QxY (X\Y)- 3e) 


x n :d e (x n , v n ) <D e 
i*" >Vm)€ T QxY 

< max 2 n( J F/p A , yv (A-|y,r)+ e ) 2 -n(i/ Qxl ,(.Y|r)- 3e ) 

PxYv£Q n {QxY,D B ) 

max 2 -n(/p xxv (A;V|F)-4 e ) 

PxYV£Q n (.QxY,D e ) 


Proof of 3): For notational convenience, let E = 
min{ J R e ((3x,E) e ),r+ i?(Qxy,^e)}- Note that, 


SJfe=l 


\Jfc=l 


< 2~ 4e) 


(63) 


Pr(£) < Pr ( U 5 fc ) + Pr ^ (J £ k 

nV|. 


< e 




Pr £ 


(64) 


fc=i 


where (a) follows from ( fl2| ). It remains to show ( [331 . To that 
end, note that, given y n £ y n and to £ [N], 


Pr(Y”=y n \£ c )< 


< 


= y n ) 

Pr(£ c ) 

2 -n(H QY (Y)-e/ 2) 


< 2 -n(H QY (Y)-e). 


-i —One/7 

1 — e z 

Therefore, 

E[Pr(d c (X n ,v n ) < D e |M = m,C n )|£ c ] 

= 53 Pr(C n |f c )Pr(de(^ n ,^ n ) < D e |M = m,C") 

C'*g£ c 


II 

M 

E 

II 

ET 

Ph 

|5 c )Pr(C"|l^ = 


yn^yr, 

C" e£ c 






Pr(d e (A", 

u n ) < Z? e Af = m,C n ) 

II 

M 

E 

P = y n 

\£ c )Pr (C n \YZ = 

» n ,n 

yn^yr, 







E 

P Xn\ mO^M 




x n : d e (x n , v n ) < D e 




(*" .y")6 T Q xi 



IAS 

M 

E 

Pr(P" = 1/" 

|5 c )Pr(C"|F^ = 

i/",£ c ) 


yn e yn C ™ e £ c 


E 2 


-n(ff Qxr (A'|F)-3e) 


x n :d e (x n ,v n )<D e 
(x n ,y n )£T QxY 


= E Pr«=t/"in E 2 


-n(-ff Oxv (X|l0-3e) 




x n :d e (x n ,v n )<D e 
(x n ,y n )£T QxY 


= 

E 

53 Pr(YZ=y n \£ c y 

X 71 

- :d e (x n ,v r 

7 )< D „ y n £T QY ^ x(ni n ) 



2 -n(H QxY (A|r)-3e) 

< 

E 

53 2 - "( flQ x( r ) -e / * 2 )- 

x n 

1 :d e (x n ,v T 

7 )<D C y*€T QY]xixn) 



2 -n(H QxY (X\Y)-3e) 

< 

E 

2 nH QxY (Y\X) 2 -n(H QxY (X,Y)-7e/2) 

X 71 

- :d e ( x n ,v r 

7 )<De 

= 

E 

2 -n(H Qx (X)~ 7e/2) 

X 71 

■:d e (x' n ,v r 

7 )<D e 

(b) 

< 

max 

2 n(H Pxv (X\V)+e/2) 2 -n(H Qx (X)-7e/2) 


where the second inequality follows from the union bound 
and ( |3TT >. Now, fix {Cjf}^ £ fl^E £'/ m £ [AT], and v n £ 
V”, and suppose K = fco- Then, 

Pr(d e (A",i;") < D e |M = m,K = k 0 , {C^tZ t) 

= Pr(d e (X”,t>”) < D e |M = TO,C^ 0 ) 

< 2 -n(JJ(Qxr,D e )-4e)^ (^5) 


where the inequality follows from (f32[). Furthermore. 


E 


Pr(d,(X",»")<£) e |M=m,A'=to,{CaLi) 


hs 

k =1 


= E 


Pr(d e (A n , u") < -D e |M = m, C k ) 


n £ i 

k =1 


= E [Pr(d e (X>") < D e |M = m,CZ o )\£ c ka ] 

< 2 -'*(S e (Qx,n e )-4e) (g^) 

where the last inequality follows from ( |33| ). Now, consider 

{cnlZ 1 e (ufj^fc) 0 - 

Pr(d e (X”,tF) < H e |M = m,{C£ Ci) 

2 nr 

= 53 Pr(X = j|M = m, {C£}fi> 

3 = 1 

Pr(d e (X”, i/*) < D e |M = m, K = j, {C^C t) 

2 ” 

< 53 2_n ( I '"2 £ )p r (d e (A n ,t; n ) < £> e |M = m,C”), (67) 
f=i 

where the inequality follows from: 

Pr(iT = j|M = m,{CnfJi) 

= Pr(JT = j)Pr(M = m|/i = j, {C£ j^i) 

EE Pr(AT = f)Pr(M = m|X = {C^Ei) 

Pr(M = to | FT = j.Cj 1 ) 

~ EE p r(M = m\K = i,C?) 

E Pr(X n = x n )Pr(M = m\X n =x n ,K= j,C ”) 


a: : 

m£Cj (x 71 


Pxv^Qxv 

< 2 _n (- R e(Qx,-D e )-4e) 

where (a) follows from ( |62[ >, and (b) can be shown analogously 
to ( [T2| ). ■ 


E E Pr(X n =s n )Pr(M=m|X"=a:" I .K'=*,C?) 

£—1 x n : 
m£Ce(x ri ) 

W Ex":meC i (cc’ 1 ) ^ 

V 2nr 0—2 ne 

Z-^£—l Z-^x n :m€Ce(x ri ) 

(b) 2 _ Ti(r— 2fi) 

where (a) follows from the fact that 1 < N x n < 

2 2ne , and (b) follows from the fact that, for any 
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i. E,n :mec .( x „)1 = |{*":(*",C(C j ))eT ( , xv }| = 

l T Q X | y fc(c 3 ')) r l = IJWggI for an y y n e r ar 

Given (u^ =1 ffc) , the terms in the summands of ( |67) are 
independent and identically distributed random variables, with 
an upper bound given by ( [65] ), and an expectation upper- 
bounded by ( |66) . It follows from Chernoff’s bound f25] 
Corollary 2] that 


Prjpr(d e pf",t) n ) <D e \M=m,{CZ}lZd>2- n(E - 8t) 


rv; 

*:=i . 


= Pr [ J2 2 ~ n(r ~ 2e)pr (de{X n ,v n ) < D e \M = m,Cj) > 
\i = 1 

n 


2—n(E—8e) 


k =1 


= Pr ( ^Pr(d e (X n ,u") < D e \M = m,C ?) > 
u' =1 

2 7171 \ 


2 ~ n(E—r— 6e) 


fc=i 


< 


2 -n(J5-r-6e) 

e 2 nr 2- n (Re(Qx ,£> e )—4e) \ 2 -n(K(Q X y ,D e )-4e) 


9 —n(E—r—6e) 

< ^g2 -n ^ 1! ( < 2 A '> £) < : ) -£ ' -4e + 6< 0^ 


2 -n(E-r—R(Q X y ,D B )-6e+4e) 


< T 


( 68 ) 


where the last inequality follows from the fact that 

R e (Qx, D e ) — E> 0, and E — r — R(Qxy, D e ) < 0. By the 

union bound. 


Pr 



< N2~ tn22trl 

< 2«CQ Xy (^;n+e-e 22 ' n ) 


<; 2 n ( lo S l-f l+«-e2 2s ") 2-i rt22e " 


Combining ( [64] > and ( [69] ) yields 

Pr (f) < e - 2 ' 1£/8 +2-i" 22£ " < e" 2 ’ 

as desired. 


(69) 

(70) 


Appendix F 

Proof of Proposition ITol 

Consider the following proposition. 

Proposition 16: Given e > 0, /3 > 0, and 

R' > maxQ. £ )(Q||p)< i a R(Q, D) =: Rp, there exists 
ri 4 (e, \X\, ||V|, R', d e ) such that for all n > n±, D > D min , 
De > Be,mi„, and for each Q x G Q n x {P, 0) (cf. ([36)) , 


max R(Qxy,D £ ) — R(Qx,R',D,D e ) 

Qxy&QxyiQx ,D): 

I Q xy (X;Y)<R' 


< e. 


Proof: Note that Proposition [15] is a special case in which 
/? = +oo and R > maxQ R(Q, D). As such, the proof follows 


along similar lines as Propositions [15] and [14] but must account 
for the rate constraint R. 

It follows directly from the definition that 

max R{Qxy , D e ) < R(Q X ,R',D, D e ). 

Qxy£ Qxy (Qx ,D): 

Iqxy(X(Y)<R' 

So, we only need to show the reverse direction. To that end, 
choose R" such that Rp < R" < If. By Proposition [6] 
R(Qx, R, D, D e ) is uniformly continuous in (Qx - R) over 
the set {(Q X ,R) ■ D(Q X \\P) < P, R" < R < R’}. Then let 
<5i > 0 be small enough such that, for all Qx G Q X (P, 0), 

\R(Qx,R r — Si,D, D e ) — R(Qx,R', D, D e )\ < e/2. (71) 

Let S 2 > 0 be small enough such that 

|| Pxy ~ Pxy || < 5 2 =>\R(P X Y,D e ) - R(P' XY ,D e ) | <e/2 
and \I Pxr (X-,Y)-I PicY {X-,Y)\<6 1 . 

(72) 


Let n > n 4 > \y\y/\X\/S 2 . Fix Q x G Q x (P,0) and let 
Py.| Y be the conditional distribution achieving the maximum 
in R(Qx 1 R’ — <5i ,D,D e ). We construct Pyjx by rounding 


the values of Py\ X ’ as done in Proposition 


14 


Proposition 14 this guarantees that QxPy\ x G 


Similarly to 
O n 


\QxPP\x - QxPQx II < 62, and E QxP ,[d(X,Y)] <D. 


(73) 


Moreover, it follows from ( |72) and ( [73) that Iq x p' xy (X- Y) < 
I Q xP xy {X\ Y) + 5 1 < R'. Therefore, 


max R{Qxv,D e ) > R(Q x PQ x ,D e ) 

QxY€Qxy(Qx,D): 

Iqxy( X ’ Y )< rI 

> R{QxPy\X^e) ~ c/2 

> R(Qx,R\ D, D e ) — e, 


where the second inequality follows from © and ( |73) , and 
the third inequality from ( fTT) . ■ 

The proposition yields 


min D(Q x \\P) + R{Q,#,D,D e )-e 

Qx£Q x (P,o) 

< „ min D(Q X \\P) + R(Q* r ,(Qx),D e ) 

Qx£Q x (P,0) 

< min D(Q x \\P)+R(Q,R!,D,D e ). 

Q x eC x (/3,o) 

By taking the limit as n goes to infinity, and noting that e is 
arbitrary, the proof is concluded. 


Appendix G 

Proofs of Propositions[T21and[T31 
A. Proof of Proposition [72] 

We restate the proposition. 

Proposition 12 Let N\ and Rf be in N, and let S and 
U be compact subsets of R Wl and R N2 , respectively. Let v 
be a non-negative continuous function defined on S x U, and 
let D be a real-valued continuous function defined on S x U. 
Suppose they satisfy the following condition: 
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(PA) If (s, Mi) £ S x U satisfies v(s, u±) = min u / e ^ v{s, u'), 
then there exists u -2 such that i?(s,u 2 ) = tf(s,ui), and 
for all s' £ S, u(s',u 2 ) = min„/ e ^ u(s',u'). 

Let to = max s£ 5 min„ e ^ v {s, it), and let p be a function on 
S x [fo,+oo) defined as follows: 

p(s,t) = min t?(s, u). 

u:v{s,u)<.t 

If for fixed s £ S, f(s,t) is continuous in t, then f(s,t) is 
continuous in the pair (s,t). 

First, note that, for all s £ S and all t > to, v~ x (s, [0, t]) = 
{u : is(s, u ) < t} is closed by continuity of v, so it is compact 
since it is also bounded. Moreover it is non-empty since t > f 0 . 
Since d is continuous and the minimization is over a compact 
set, f is well defined. 

Now fix (s,f) £ S x [fo,+oo), and consider any sequence 
(sfcjffc) ► (s,t). Let t s = min u6 ^ i/(s, u) and consider any 
e > 0. 

If t > t a : 

By continuity of p(s, t ) as a function of t for fixed s, there 
exists S > 0 such that \t-t'\ < <5 => | f{s,t) -p{s,t’)\ < e. 
Let t! = t — min{(5/2, (t — t s )/ 2}, and let 
v! £ argmin u:!/ / s u ) <t / t?(s, it). Then, 2 /(s,u') < t and 

f(s, t') = i9(s, u') < f(s, t) + e. 

If t = t s : 

Let v! be a minimizer for f{s, t s ) satisfying v(s', u') = t s ’ 
for all s' £ S. Such choice is possible by assumption (PA). 
Note that •d(s,u l ) = f(s,t s ). 

We claim that the choice of u' is feasible for the min¬ 
imization in f(sk,tk), i.e., v{sk,u') < tk for sufficiently 
large k. Indeed, if t > t s , v{sk,u') i/(s,u') = t! < t, 
then for sufficiently large k, v(sk,u') < t-k- If t = t s , then 

u ) — t Sk ^ to ^ £fc- 

Moreover, by continuity of ■&, ^(skT U 1 ) —> Then, 

for sufficiently large k, ^(skTU 1 ) < f(s,t) + e/2. So, we get 

limsup f(sk, tk) < limsupt9(sfc,^t , ) < f(s,t). 

k—too k—> 00 

On the other hand, let Uk be a minimizer for 1 f(sk,t-k). 
Consider a sequence of integers {kj} such that 

¥>0fc, ,4 7 ) -A liminfi f(s k ,t k ). 

fc-> 00 

Let {uk } be the corresponding subsequence of minimizers. 
Since U is a bounded set, then has a convergent 

subsequence {u^}. Let v! be its limit. By continuity of v , 
we have i/(s, v!) = lim^^^ v{s kjl , u kje ) < lim^^ t kjt = t. 
Therefore, 


B. Proof of Proposition \ /.?| 


We restate the proposition. 

Proposition 13 Let N be in N, and let T be a non-empty 
compact subset of R /V . Let L be a real-valued continuous 
function defined on T. Let T) D f 2 D • ■ ■ be a decreasing se¬ 
quence of non-empty compact subsets of T. Let T = f) ; , 1\. 
Then, 


lim maxL(£) = maxf(t). 
k—f 00 tGTfc t(zT 


Moreover, let S\ C S2 Q ■ • ■ be an increasing sequence of 
non-empty compact subsets of T. Let S = Ui>i (where 
the bar denotes closure of the set). Then, 


lim maxL(£) =maxL(t). 

k—f 00 t£Sk t£S 

Consequently, if T is also convex, and L c is a real-valued 
convex and continuous function defined on T with so = 

min tS 7 - L c (t), then 

L(s ) := max Lit) 
t-.L c {t)<s 

is continuous in s £ [so,+oo). 


First, note that T is non-empty and compact since a count¬ 
able intersection of non-empty decreasing compact sets is non¬ 
empty and compact. Let 

tk = argma xL(t) and t* = argma xi(i). 

tGT fc teT 

We need to show that L{tk) —> L(t*). Let B${t) = {t 1 £ T : 
||£' — £|| < d}, and consider the following claim. 

Claim 1: For all <5 > 0, there exists A: ( j such that for all 
k > ko, Tk C Bs(T), where 

B S (T) = IJ B s (t). 

t£T 

We show first how the claim yields our result. Let e > 0 
be given. By the uniform continuity of L (continuity on a 
compact set), there exists S > 0 such that \\t — f'|| < S => 
| L{t) — L{t')\ < e. Let k be large enough as guaranteed by 
the claim. Then, for all t £ Tk, there exists t' £ T such that 
||f — 1'\\ < S, and subsequently \L(t) — L(t')\ < e. In particular, 
there exists t’ £ T such that | L{tk) — L{t') \ < e. Then, we get 
L(tk) < L{t') + e < L(t*) + e. Since L{tk) > L(t*), we get 
| L{tk) — L(t*)\ < e. Therefore, L{tk) —> Lift*). It remains to 
prove the claim to establish the first part of the proposition. 

Proof of Claim 1: Fix <5 > 0. Bs(T) is open in T by 
construction. Therefore, Tk\Bs(T ) is closed in T. Since T is 
closed in R^, then Tk\Bs{T) is also closed in R^. Moreover, 
it is bounded, so it is compact. Since 

n T k\MT) = n W t )= t \mt)= 0, 

i> 1 \i> 1 / 


ip(s, t) < #(s, v!) 


lim t?(s fc ,u k ) 

^—>00 1 l 

lim f(sk ,tk ) 
e-+oo K H 


liminf<p(s fe ,£ fe ). 

k—f 00 


and Tk\Bs{T) is a decreasing sequence of compact sets, 
there exists ko such that for all k > ko, T k \Bs{T) is empty. 

Similarly, to prove the second part of the proposition, let 

Sk = argma xi(t) and s* = argma xL(t). 

teS k tes 
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We need to show that L(s k ) —> L(s*). To this end, consider 
the following claim. 

Claim 2: For all <5 > 0, there exists k-\ such that for all 
k > fci, S C B s (S k ). 

We show first how the claim yields our result. Let e > 0 
be given. By the uniform continuity of L, there exists 6 > 
0 such that \\t — t '| < 5 => \L(t) — L(t')\ < e. Let k be 
large enough as guaranteed by the claim. Then, for all t £ S, 
there exists t! £ S k such that ||f — t'\\ < S , and subsequently 
| L(t) — L{t')\ < e. In particular, there exists t! £ Sk such that 
|L(s*) — L(t')\ < e. Then, we get L{sk) > L(t') > L(s*) — e. 
Since L(s k ) < L(s*), we get | L(s k ) — L(s*)| < e. Therefore, 
L(sk) L(s *). It remains to prove the claim. 

Proof of Claim 2: Fix S > 0. Bs(S k ) is open in T 
by construction. Therefore, S\Bs(S k ) is closed in T. Then 
S\Bs{Sk) is closed in M. N . Moreover, it is bounded, so it is 
compact. Since 


f| S\Bs(S k ) = s\ U Bs(S k ) = (J Si\B s U Si =0, 

i> 1 \i> 1 ) i> 1 \i> 1 ) 

and S\B$(Sk) is a decreasing sequence of compact sets, 
there exists k\ such that for all k > k\, S\Bs(S k ) is empty. 

Finally, consider L{s). If L c is a constant function, then the 
statement is trivial. If not, consider s > so, and let s k be a 
decreasing sequence converging to s. Then, 

lim L(sfe) = lim max L(t) = max L(t) = L(s ), 

k —>oo k—too t:L c {t)<Sh t:L c (t)<s 

where the second equality follows from the first part of the 
proposition. Therefore, L(s) is right-continuous. Now, con¬ 
sider s > Sq, and let s k be an increasing sequence converging 
to s. Note that, 

(J {t € T : L c (t) < s k } = {t G T : L c (t) < s} . 

k>l 

Denote the above set by S~ and let S = {t £ T : L c (f) < s}. 
The second part of the proposition implies that 

lim L(s k ) = lim max L(t) = max L(t). 

k—too k—too t:L c (t)<Sk t£S~ 

So it suffices to show that S~ = S. Clearly, S~ C S since 
S is closed and S~ C S. It remains to show that any point t 
satisfying L c (t) = s is a boundary point of S~ . To that end, 
note that L c (t ) is not a local minimum since L c (t) = s > sq 
and L c is convex by assumption. Therefore, any neighborhood 
of t intersects S~ . As such S~ = S, and L(s) is left- 
continuous, as desired. ■ 
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